MFSGrp-package | R Documentation |
This R package runs the Group Elastic Net (including lasso, ridge, elastic net, and ordinary least square) regression with scalar response values and observed functional covariates. In addition, it penalizes the curvature of the output by implementing a penalty on the second derivative of the estimated coefficient curves. One of the two algorithms of this package is ADMM (mostly developed in C++). ADMM is designed for parallel computations and is only recommended on systems equipped with many strong cores. This algorithm runs parallel on Linux, but it runs serial on Windows. The second algorithm uses the fGMD package that is built exclusively for this package. The fGMD package is a heavily modified version of the gglasso package (BMD and Strong Rule). The features added to the original gglasso package are: the mixing parameter (alpha) and its net search cross-validation, the curvature penalization for functional regression and its net search cross-validation, the optimized Fortran core function to accelerate the curvature penalization updates, and the progress reports with time estimations. For this package to work, first install fGMD. The fGMD package does not work independently from this package, and it does not interfere with the functions of the gglasso package due to the slight name differences.
MFSGrp( Ytrain, Xtrain, basisno=5, tt, lambda=NULL, alpha=NULL, part, Xpred=NULL, Ypred=NULL, Silence=FALSE, bspline=FALSE, Penalty=NULL, lambdafactor=0.005, nfolds=5, predloss="L2", eps = 1e-08, maxit = 3e+08, nlambda=100, forcezero=FALSE, forcezeropar=0.001, sixplotnum=1, lambdaderivative=NULL, nfolder=5, nalpha=9, nlamder=10, lamdermin=1e-9, lamdermax=1e-3, alphamin=0 , alphamax=1, a=3.7, ADMM=FALSE, numcores=NULL, rho=1, unbalanced=FALSE )
Ytrain |
a vector of scalar response value of size n in the train set. |
Xtrain |
a three-dimensional matrix (array) of p functional covariates observed at nt time points and sampled n times in the train set. The size of the array is p*n*nt. It is recommended that the time series signals are of the same type. For instance, an fMRI data set should be all BOLD contrast activities of the human brain that are recorded nt times during a time period T at p Regions of Interests of the Brain Atlas for n human subjects. In an economic data set example, the covariates should be time series signals that are all measured in percentages or all measured in per capita unit. |
basisno |
the number of bases (of Fourier or B-spline) used in the process of converting the (time series) signals, i.e: observed functional covariates to their functional objects. If |
tt |
a vector of the size nt that includes grid points within 0 to 1. This vector represents the time of observations during the whole period of time T, scaled to the interval (0,1). |
lambda |
regularization parameter of the (group) lasso penalty term (given alpha=0). By leaving this argument unspecified, the program computes the optimized value in a |
alpha |
the Elastic Net mixing parameter. For the (group) lasso, |
part |
a vector of size p with entries of |
Xpred, Ypred |
the test set versions of |
Silence |
by default is |
bspline |
by default is |
Penalty |
takes five values:
|
predloss |
the prediction loss used only in the process of net search and cross-validations. Inherited from the |
eps |
inherited from the |
maxit |
inherited from the |
forcezero |
by default is |
lambdaderivative |
the regularization parameter of the second derivative penalty of the estimated curves. The value 0 does not penalize the second derivative. By leaving this argument unspecified, the program computes the optimized value in a |
ADMM |
by default is |
unbalanced |
by default is |
coef |
vector of size |
predict |
predicted values associated with |
MSEpredict |
mean squared error of the predictions. |
lambda |
the regularized |
Ali Mahzarnia and Jun Song
Maintainer: Ali Mahzarnia <amahzarn@uncc.edu> <ali.mahzarnia@duke.edu>
Ali Mahzarnia, Jun Song. "Multivariate functional group sparse regression: functional predictor selection," Submitted for publication in 2021.
URL: https://arxiv.org/abs/2107.02146
Yang, Y. and Zou, H. (2015), "A Fast Unified Algorithm for Computing Group-Lasso Penalized Learning Problems," Statistics and Computing. 25(6), 1129-1141.
Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths for generalized linear models via coordinate descent," Journal of Statistical Software, 33, 1.
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2010) "Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers" Foundations and Trends R in Machine Learning, 3,1.
Fan, J and Li, R. (2001) "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties" Journal of the American Statistical Association, 96, 456.
fGMD
package. Install the fGMD
package in order to have MFSGrp
package work.
URL: https://github.com/Ali-Mahzarnia/fGMD
p=35 # number of functional predictors for each observation n=200 # sample size nt=500 # number of recorded time points for each functional covariate # nt will be reduced to 100 after the inner products are computed below X= array(NaN, c(p, n, nt)); # Brownian motion for(j in 1:p){ for (i in 1:n){ X[j,i,]=cumsum(rnorm(nt,0,1)) } } # true nonzero coefs: beta_5, beta_8, beta_11, and the rest are zeros # beta_5(t)=sin(3*pi*t), beta_8(t)=sin(5*pi*t/2) and beta_11(t)=t^2 beta5 = function(t){return(sin(3*pi*t/2))} beta8 = function(t){return(sin(5*pi*t/2))} beta11=function(t){return(t^2)} b5=matrix(0, ncol=1, nrow=nt) b8=matrix(0, ncol=1, nrow=nt) b11=matrix(0, ncol=1, nrow=nt) # evaluate population betas on (0,1) at five hundred time points for(i in 0:nt){ j=i b5[i]=beta5(j/nt) b8[i]=beta8(j/nt) b11[i]=beta11(j/nt) } # evaluate the inner products of Xs and beta 5 and 8 and 11 via Reiman sum Xb5=matrix(0, ncol=n, nrow=1) Xb8=matrix(0, ncol=n, nrow=1) Xb11=matrix(0, ncol=n, nrow=1) for(j in 1:n){ Xb5[j]=(X[5,j,] %*%b5)/nt Xb8[j]=(X[8,j,]%*%b8)/nt Xb11[j]=(X[11,j,]%*%b11)/nt } # construct Y Y=matrix(0, ncol=n, nrow=1) # standard deviation of the noise term sd=0.05 # noise term eps=matrix(0, ncol=n, nrow=1) for(n in 1:n){ eps[, n]=rnorm(1,0,sd) } Y=Xb5+Xb8+Xb11+eps # the algorithm takes care of the intercept in the prediction Y=Y+3; #intercept # make the design matrix (pick every 5 elements), here nt becomes 100 X.obs = X[,,(1:100)*nt/100, drop=FALSE] # observed times scaled to (0,1) tt=(1:100)/100 # test and train sets (half, half) trainIndex=sample(1:n, size = round(0.5*n), replace=FALSE) Ytrain=Y[, trainIndex, drop = FALSE ] Ytest=Y[, -trainIndex, drop = FALSE ] Xtrain=X.obs[,trainIndex,, drop=FALSE] Xtest=X.obs[,-trainIndex,, drop=FALSE] # The model: # total 35 functional predictors # beta_5(t), beta_8(t), beta_11(t) are nonzero, and the others are zero # plot X^1, ..., X^9 (out of total p=35) par(mfrow=c(3,3)) for(j in 1:9){ plot(tt,Xtrain[j,1,],type='l', ylim=c(-30,30), main=paste0("j=",j)) for(k in 2:10)lines(tt, Xtrain[j,k,]) } par(mfrow=c(1,1)) # load the library library(MFSGrp) # run m=17 #basisino part=rep(m,p) # partition # green: true beta (only beta5, beta8, beta11 are the nonzero functions) # black: estimated betas # lasso # in order to see all figures after the run, use the "previous plot" arrow on Rstudio results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest, Ypred=Ytest, Penalty = "glasso" , bspline=TRUE, sixplotnum="max" , lamdermax=1e-3, lamdermin = 1e-5) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients results$lambda # the regularized lambda # lasso with forcezero results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest, Ypred=Ytest, Penalty = "glasso" , bspline=TRUE, sixplotnum="max" , lamdermax=1e-3, lamdermin = 1e-5, forcezero=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients # lasso with forcezero and without penalizing the second derivative results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest, Ypred=Ytest, Penalty = "glasso" , bspline=TRUE, sixplotnum="max" , lambdaderivative=0, forcezero=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients # elastic net with different settings results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest,Ypred=Ytest, Penalty = "gelast" , bspline=FALSE, nlamder=15, nalpha=11, nfolder=4 ) sum(results$coef==0)/m # number of zero functional coefficients sqrt(results$MSEpredict) # test Root MSE # oracle # as explained in the manual, because of C++, this regression is faster with ADMM, even on Windows active=c(5,8,11) # the active set: set of indices of the three nonzero coefficients results=MFSGrp(Y=Ytrain,X=Xtrain[active,,],basisno=m,tt, part=part,Xpred =Xtest[active,,],Ypred=Ytest, Penalty ="OLS", bspline = TRUE, sixplotnum = "max" ,lambdaderivative = NULL, lamdermax = 1e-3, lamdermin=1e-5, nfolder=10, ADMM=TRUE) sqrt(results$MSEpredict) # test Root MSE # regression with no shrinkage or sparse penalty, only with second derivative penalty results=MFSGrp(Ytrain,X=Xtrain,basisno=m,tt, part=part,Xpred=Xtest,Ypred=Ytest, Penalty ="OLS", bspline = TRUE, sixplotnum = 3, ADMM=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients (naturally zero) # regression with no penalty results=MFSGrp(Ytrain,X=Xtrain,basisno=m,tt, part=part,Xpred=Xtest,Ypred=Ytest, Penalty ="OLS", bspline = TRUE, sixplotnum = 2, lambdaderivative=0) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients (naturally zero) # a penelized example with ADMM results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest, Ypred=Ytest, Penalty = "glasso" , bspline=TRUE, sixplotnum="max" , lamdermax=1e-3, lamdermin = 1e-5, ADMM=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients. results$lambda # the regularized lambda # a penelized example with ADMM and a different rho results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest, Ypred=Ytest, Penalty = "glasso" , bspline=TRUE, sixplotnum="max" , lamdermax=1e-3, lamdermin = 1e-5, ADMM=TRUE, rho=0.5) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients results$lambda # the regularized lambda # remove the population curves before data application remove(b5,b8,b11) # Unbalanced time points: # We'd generate functional data that are observed at a different number of time points. # For example, the first covariate is observed at 91 time points (or 91*5) and the second at 112 (or 112*5) p=19 # number of functional predictors for each observation n=300 # sample size nt=5*sample(80:120, p, replace=T) # vector of number of recorded *5 and later we pick 1 of each 5 #time points for each functional covariate that are different #and here are chosen randmoly X=vector("list",p) # X is a list of p, each item is a mtrix of n*nt(j): X[[1]],...X[[p]] #(p, n, nt) for(j in 1:p){# first make matrices of n*nt(j) for each X[j] : j=1,...,p X[[j]]=matrix(0, n, nt[j]); } for(j in 1:p){# Brownian motion for (i in 1:n){ X[[j]][i,]=as.numeric(cumsum(rnorm(nt[j],0,1))) } } # true nonzero coefs: beta_5, beta_8, beta_11, and the rest are zeros # beta_1(t)=sin(3*pi*t), beta_2(t)=sin(5*pi*t/2) and beta_3(t)=t^2 beta1 = function(t){return(sin(3*pi*t/2))} beta2 = function(t){return(sin(5*pi*t/2))} beta3=function(t){return(t^2)} b1=matrix(0, ncol=1, nrow=nt[1]) b2=matrix(0, ncol=1, nrow=nt[2]) b3=matrix(0, ncol=1, nrow=nt[3]) # evaluate population betas on (0,1) grids for (d in 1:3) { temp=get(paste0("b", d, sep="")) for(k in 1:nt[d]){ temp[k]=get(paste("beta", d, sep=""))(k/nt[d]);} assign((paste0("b", d, sep="")),temp ) } # evaluate the inner products of Xs and beta 5 and 8 and 11 via Reiman sum Xb1=matrix(0, ncol=n, nrow=1) Xb2=matrix(0, ncol=n, nrow=1) Xb3=matrix(0, ncol=n, nrow=1) for(j in 1:n){ Xb1[j]=(X[[1]][j,] %*%b1)/nt[1] Xb2[j]=(X[[2]][j,]%*%b2)/nt[2] Xb3[j]=(X[[3]][j,]%*%b3)/nt[3] } # construct Y Y=matrix(0, ncol=n, nrow=1) # standard deviation of the noise term sd=0.05 # noise term eps=matrix(0, ncol=n, nrow=1) for(n in 1:n){ eps[, n]=rnorm(1,0,sd) } Y=Xb1+Xb2+Xb3+eps # the algorithm takes care of the intercept in the prediction Y=Y+3; #intercept # make the design matrix (pick every 5 elements), here nt[j] becomes nt[j]/5 X.obs=vector("list",p); for (j in 1:p) { X.obs[[j]]=X[[j]][,seq(5, nt[j], 5), drop=FALSE]; nt[j]=dim(X.obs[[j]])[2] } #X.obs = X[,,(1:100)*nt/100, drop=FALSE] #X.obs=X # observed times scaled to (0,1) tt=vector("list",p); for (j in 1:p) { tt[[j]]= (1:nt[j])/nt[j] } # test and train sets (half, half) trainIndex=sample(1:n, size = round(0.5*n), replace=FALSE) Ytrain=Y[, trainIndex, drop = FALSE ] Ytest=Y[, -trainIndex, drop = FALSE ] Xtrain=vector("list",p); Xtest=vector("list",p); for (j in 1:p) { Xtrain[[j]]=X.obs[[j]][trainIndex,, drop=FALSE] Xtest[[j]]=X.obs[[j]][-trainIndex,, drop=FALSE] } # The model: # total 3 functional predictors # beta_1(t), beta_2(t), beta_3(t) are nonzero, and the others are zero # plot X^1, ..., X^9 (out of total p=35) par(mfrow=c(3,3)) for(j in 1:9){ plot(tt[[j]],Xtrain[[j]][1,],type='l', ylim=c(-30,30), main=paste0("j=",j)) for(k in 2:10)lines(tt[[j]], Xtrain[[j]][k,]) } par(mfrow=c(1,1)) # load the library library(MFSGrp) # run m=17 #basisino part=rep(m,p) # partition # green: true beta (only beta1, beta2, beta3 are the nonzero functions) # black: estimated betas # lasso # in order to see all figures after the run, use the "previous plot" arrow on Rstudio results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest, Ypred=Ytest, Penalty = "glasso" , bspline=TRUE, sixplotnum="max" , lamdermax=1e-3, lamdermin = 1e-5, unbalanced=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients results$lambda # the regularized lambda # lasso with forcezero results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest, Ypred=Ytest, Penalty = "glasso" , bspline=TRUE, sixplotnum="max" , lamdermax=1e-3, lamdermin = 1e-5, forcezero=TRUE, unbalanced=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients # elastic net with different settings results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest,Ypred=Ytest, Penalty = "gelast" , bspline=FALSE, nlamder=15, nalpha=11, nfolder=4,unbalanced=TRUE ) sum(results$coef==0)/m # number of zero functional coefficients sqrt(results$MSEpredict) # test Root MSE # oracle # as explained in the manual, because of C++, this regression is faster with ADMM, even on Windows active=c(1,2,3) # the active set: set of indices of the three nonzero coefficients results=MFSGrp(Y=Ytrain,X=Xtrain[active],basisno=m,tt, part=part,Xpred =Xtest[active],Ypred=Ytest, Penalty ="OLS", bspline = TRUE, sixplotnum = "max" ,lambdaderivative = NULL, lamdermax = 1e-3, lamdermin=1e-5, nfolder=10, ADMM=TRUE,unbalanced=TRUE) sqrt(results$MSEpredict) # test Root MSE # regression with no shrinkage or sparse penalty, only with second derivative penalty results=MFSGrp(Ytrain,X=Xtrain,basisno=m,tt, part=part,Xpred=Xtest,Ypred=Ytest, Penalty ="OLS", bspline = TRUE, sixplotnum = 3, ADMM=TRUE,unbalanced=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients (naturally zero) # regression with no penalty results=MFSGrp(Ytrain,X=Xtrain,basisno=m,tt, part=part,Xpred=Xtest,Ypred=Ytest, Penalty ="OLS", bspline = TRUE, sixplotnum = 2, lambdaderivative=0, unbalanced=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients (naturally zero) # a penelized example with ADMM results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest, Ypred=Ytest, Penalty = "glasso" , bspline=TRUE, sixplotnum="max" , lamdermax=1e-3, lamdermin = 1e-5, ADMM=TRUE, unbalanced=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients. results$lambda # the regularized lambda # a penelized example with ADMM and a different rho results=MFSGrp(Ytrain,Xtrain,basisno=m,tt, part=part,Xpred=Xtest, Ypred=Ytest, Penalty = "glasso" , bspline=TRUE, sixplotnum="max" , lamdermax=1e-3, lamdermin = 1e-5, ADMM=TRUE, rho=0.5, unbalanced=TRUE) sqrt(results$MSEpredict) # test Root MSE sum(results$coef==0)/m # number of zero functional coefficients results$lambda # the regularized lambda # remove the population curves before data application remove(b1,b2,b3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.