R/gpOptimizationInterface.R
In lessSEM: Non-Smooth Regularization for Structural Equation Models

Documented in gpAdaptiveLasso gpCappedL1 gpElasticNet gpLasso gpLsp gpMcp gpRidge gpScad

#' gpLasso 
#' 
#' Implements lasso regularization for general purpose optimization problems.
#' The penalty function is given by:
#' \deqn{p( x_j) = \lambda |x_j|}
#' Lasso regularization will set parameters to zero if \eqn{\lambda} is large enough
#' 
#' The interface is similar to that of optim. Users have to supply a vector 
#' with starting values (important: This vector _must_ have labels) and a fitting
#' function. This fitting functions _must_ take a labeled vector with parameter
#' values as first argument. The remaining arguments are passed with the ... argument.
#' This is similar to optim.
#' 
#' The gradient function gr is optional. If set to NULL, the \pkg{numDeriv} package
#' will be used to approximate the gradients. Supplying a gradient function
#' can result in considerable speed improvements.  
#' 
#' Lasso regularization:
#' 
#' * Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical 
#' Society. Series B (Methodological), 58(1), 267–288.
#'  
#' For more details on GLMNET, see:
#' 
#' * Friedman, J., Hastie, T., & Tibshirani, R. (2010). 
#' Regularization Paths for Generalized Linear Models via Coordinate Descent. 
#' Journal of Statistical Software, 33(1), 1–20. https://doi.org/10.18637/jss.v033.i01
#' * Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2010).
#' A Comparison of Optimization Methods and Software for Large-scale 
#' L1-regularized Linear Classification. Journal of Machine Learning Research, 11, 3183–3234.
#' * Yuan, G.-X., Ho, C.-H., & Lin, C.-J. (2012). 
#' An improved GLMNET for l1-regularized logistic regression. 
#' The Journal of Machine Learning Research, 13, 1999–2030. https://doi.org/10.1145/2020408.2020421
#' 
#' For more details on ISTA, see:
#' 
#' * Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding 
#' Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1),
#' 183–202. https://doi.org/10.1137/080716542
#' * Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). 
#' A General Iterative Shrinkage and Thresholding Algorithm for Non-convex 
#' Regularized Optimization Problems. Proceedings of the 30th International 
#' Conference on Machine Learning, 28(2)(2), 37–45.
#' * Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and 
#' Trends in Optimization, 1(3), 123–231.
#'  
#' @param par labeled vector with starting values
#' @param regularized vector with names of parameters which are to be regularized.
#' @param fn R function which takes the parameters as input and returns the 
#' fit value (a single value)
#' @param gr R function which takes the parameters as input and returns the 
#' gradients of the objective function. If set to NULL, numDeriv will be used
#' to approximate the gradients 
#' @param lambdas numeric vector: values for the tuning parameter lambda
#' @param nLambdas alternative to lambda: If alpha = 1, lessSEM can automatically
#' compute the first lambda value which sets all regularized parameters to zero.
#' It will then generate nLambda values between 0 and the computed lambda.
#' @param reverse if set to TRUE and nLambdas is used, lessSEM will start with the
#' largest lambda and gradually decrease lambda. Otherwise, lessSEM will start with
#' the smallest lambda and gradually increase it.
#' @param curve Allows for unequally spaced lambda steps (e.g., .01,.02,.05,1,5,20). 
#' If curve is close to 1 all lambda values will be equally spaced, if curve is large 
#' lambda values will be more concentrated close to 0. See ?lessSEM::curveLambda for more information.
#' @param ... additional arguments passed to fn and gr
#' @param method which optimizer should be used? Currently implemented are ista
#' and glmnet. 
#' @param control used to control the optimizer. This element is generated with 
#' the controlIsta and controlGlmnet functions. See ?controlIsta and ?controlGlmnet
#' for more details.
#' @returns Object of class gpRegularized

#' @examples 
#' # This example shows how to use the optimizers
#' # for other objective functions. We will use
#' # a linear regression as an example. Note that
#' # this is not a useful application of the optimizers
#' # as there are specialized packages for linear regression
#' # (e.g., glmnet)
#' 
#' library(lessSEM)
#' set.seed(123)
#' 
#' # first, we simulate data for our
#' # linear regression.
#' N <- 100 # number of persons
#' p <- 10 # number of predictors
#' X <- matrix(rnorm(N*p),	nrow = N, ncol = p) # design matrix
#' b <- c(rep(1,4), 
#'        rep(0,6)) # true regression weights
#' y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2)
#' 
#' # First, we must construct a fiting function
#' # which returns a single value. We will use
#' # the residual sum squared as fitting function.
#' 
#' # Let's start setting up the fitting function:
#' fittingFunction <- function(par, y, X, N){
#'   # par is the parameter vector
#'   # y is the observed dependent variable
#'   # X is the design matrix
#'   # N is the sample size
#'   pred <- X %*% matrix(par, ncol = 1) #be explicit here: 
#'   # we need par to be a column vector
#'   sse <- sum((y - pred)^2)
#'   # we scale with .5/N to get the same results as glmnet
#'   return((.5/N)*sse)
#' }
#' 
#' # let's define the starting values:
#' b <- rep(0,p)
#' names(b) <- paste0("b", 1:length(b))
#' # names of regularized parameters
#' regularized <- paste0("b",1:p)
#' 
#' # optimize
#' lassoPen <- gpLasso(
#'   par = b, 
#'   regularized = regularized, 
#'   fn = fittingFunction, 
#'   nLambdas = 100, 
#'   X = X,
#'   y = y,
#'   N = N
#' )
#' plot(lassoPen)
#' 
#' # You can access the fit results as follows:
#' lassoPen@fits
#' # Note that we won't compute any fit measures automatically, as
#' # we cannot be sure how the AIC, BIC, etc are defined for your objective function 
#' 
#' @export
gpLasso <- function(par,
                    regularized,
                    fn,
                    gr = NULL,
                    lambdas = NULL,
                    nLambdas = NULL,
                    reverse = TRUE,
                    curve = 1,
                    ...,
                    method = "glmnet", 
                    control = lessSEM::controlGlmnet()
){
  
  removeDotDotDot <- .noDotDotDot(fn, fnName = "fn", ... = ...)
  fn <- removeDotDotDot[[1]]
  additionalArguments <- removeDotDotDot$additionalArguments
  
  if(!is.null(gr)){
    
    removeDotDotDot <- .noDotDotDot(gr, fnName = "gr", ...)
    gr <- removeDotDotDot[[1]]
    additionalArguments <- c(additionalArguments, 
                             removeDotDotDot$additionalArguments[!names(removeDotDotDot$additionalArguments) %in% names(additionalArguments)])
  }
  
  # remove the ... stuff so that it does not interfere with anything else
  rm("...")
  
  if(is.null(lambdas) && is.null(nLambdas)){
    stop("Specify either lambdas or nLambdas")
  }
  
  if(!is.null(nLambdas)){
    tuningParameters <- data.frame(nLambdas = nLambdas,
                                   reverse = reverse,
                                   curve = curve)
  }else{
    tuningParameters <- data.frame(lambda = lambdas,
                                   alpha = 1,
                                   theta = 0)
  }
  
  result <- .gpOptimizationInternal(par = par,
                                    fn = fn,
                                    gr = gr,
                                    additionalArguments = additionalArguments,
                                    penalty = "lasso", 
                                    weights = regularized,
                                    tuningParameters = tuningParameters, 
                                    method = method,
                                    control = control
  )
  
  return(result)
  
}

#' gpAdaptiveLasso 
#' 
#' Implements adaptive lasso regularization for general purpose optimization problems.
#' The penalty function is given by:
#' \deqn{p( x_j) = p( x_j) = \frac{1}{w_j}\lambda| x_j|}
#' Adaptive lasso regularization will set parameters to zero if \eqn{\lambda} is large enough.
#' 
#' The interface is similar to that of optim. Users have to supply a vector 
#' with starting values (important: This vector _must_ have labels) and a fitting
#' function. This fitting functions _must_ take a labeled vector with parameter
#' values as first argument. The remaining arguments are passed with the ... argument.
#' This is similar to optim.
#' 
#' The gradient function gr is optional. If set to NULL, the \pkg{numDeriv} package
#' will be used to approximate the gradients. Supplying a gradient function
#' can result in considerable speed improvements.  
#' 
#' Adaptive lasso regularization:
#' 
#' * Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 
#' 101(476), 1418–1429. https://doi.org/10.1198/016214506000000735
#'  
#' For more details on GLMNET, see:
#' 
#' * Friedman, J., Hastie, T., & Tibshirani, R. (2010). 
#' Regularization Paths for Generalized Linear Models via Coordinate Descent. 
#' Journal of Statistical Software, 33(1), 1–20. https://doi.org/10.18637/jss.v033.i01
#' * Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2010).
#' A Comparison of Optimization Methods and Software for Large-scale 
#' L1-regularized Linear Classification. Journal of Machine Learning Research, 11, 3183–3234.
#' * Yuan, G.-X., Ho, C.-H., & Lin, C.-J. (2012). 
#' An improved GLMNET for l1-regularized logistic regression. 
#' The Journal of Machine Learning Research, 13, 1999–2030. https://doi.org/10.1145/2020408.2020421
#' 
#' For more details on ISTA, see:
#' 
#' * Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding 
#' Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1),
#' 183–202. https://doi.org/10.1137/080716542
#' * Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). 
#' A General Iterative Shrinkage and Thresholding Algorithm for Non-convex 
#' Regularized Optimization Problems. Proceedings of the 30th International 
#' Conference on Machine Learning, 28(2)(2), 37–45.
#' * Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and 
#' Trends in Optimization, 1(3), 123–231.
#'  
#' @param par labeled vector with starting values
#' @param regularized vector with names of parameters which are to be regularized.
#' @param weights labeled vector with adaptive lasso weights. NULL will use 1/abs(par)
#' @param fn R function which takes the parameters as input and returns the 
#' fit value (a single value)
#' @param gr R function which takes the parameters as input and returns the 
#' gradients of the objective function. If set to NULL, numDeriv will be used
#' to approximate the gradients 
#' @param lambdas numeric vector: values for the tuning parameter lambda
#' @param nLambdas alternative to lambda: If alpha = 1, lessSEM can automatically
#' compute the first lambda value which sets all regularized parameters to zero.
#' It will then generate nLambda values between 0 and the computed lambda.
#' @param reverse if set to TRUE and nLambdas is used, lessSEM will start with the
#' largest lambda and gradually decrease lambda. Otherwise, lessSEM will start with
#' the smallest lambda and gradually increase it.
#' @param curve Allows for unequally spaced lambda steps (e.g., .01,.02,.05,1,5,20). 
#' If curve is close to 1 all lambda values will be equally spaced, if curve is large 
#' lambda values will be more concentrated close to 0. See ?lessSEM::curveLambda for more information.
#' @param ... additional arguments passed to fn and gr
#' @param method which optimizer should be used? Currently implemented are ista
#' and glmnet. 
#' @param control used to control the optimizer. This element is generated with 
#' the controlIsta and controlGlmnet functions. See ?controlIsta and ?controlGlmnet
#' for more details.
#' @returns Object of class gpRegularized

#' @examples 
#' # This example shows how to use the optimizers
#' # for other objective functions. We will use
#' # a linear regression as an example. Note that
#' # this is not a useful application of the optimizers
#' # as there are specialized packages for linear regression
#' # (e.g., glmnet)
#' 
#' library(lessSEM)
#' set.seed(123)
#' 
#' # first, we simulate data for our
#' # linear regression.
#' N <- 100 # number of persons
#' p <- 10 # number of predictors
#' X <- matrix(rnorm(N*p),	nrow = N, ncol = p) # design matrix
#' b <- c(rep(1,4), 
#'        rep(0,6)) # true regression weights
#' y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2)
#' 
#' # First, we must construct a fiting function
#' # which returns a single value. We will use
#' # the residual sum squared as fitting function.
#' 
#' # Let's start setting up the fitting function:
#' fittingFunction <- function(par, y, X, N){
#'   # par is the parameter vector
#'   # y is the observed dependent variable
#'   # X is the design matrix
#'   # N is the sample size
#'   pred <- X %*% matrix(par, ncol = 1) #be explicit here: 
#'   # we need par to be a column vector
#'   sse <- sum((y - pred)^2)
#'   # we scale with .5/N to get the same results as glmnet
#'   return((.5/N)*sse)
#' }
#' 
#' # let's define the starting values:
#' b <- c(solve(t(X)%*%X)%*%t(X)%*%y) # we will use the lm estimates
#' names(b) <- paste0("b", 1:length(b))
#' # names of regularized parameters
#' regularized <- paste0("b",1:p)
#' 
#' # define the weight for each of the parameters
#' weights <- 1/abs(b)
#' # we will re-scale the weights for equivalence to glmnet.
#' # see ?glmnet for more details
#' weights <- length(b)*weights/sum(weights)
#' 
#' # optimize
#' adaptiveLassoPen <- gpAdaptiveLasso(
#'   par = b, 
#'   regularized = regularized, 
#'   weights = weights,
#'   fn = fittingFunction, 
#'   lambdas = seq(0,1,.01), 
#'   X = X,
#'   y = y,
#'   N = N
#' )
#' plot(adaptiveLassoPen)
#' # You can access the fit results as follows:
#' adaptiveLassoPen@fits
#' # Note that we won't compute any fit measures automatically, as
#' # we cannot be sure how the AIC, BIC, etc are defined for your objective function 
#' 
#' # for comparison:
#' # library(glmnet)
#' # coef(glmnet(x = X,
#' #            y = y,
#' #            penalty.factor = weights,
#' #            lambda = adaptiveLassoPen@fits$lambda[20],
#' #            intercept = FALSE,
#' #            standardize = FALSE))[,1]
#' # adaptiveLassoPen@parameters[20,]
#' @export
gpAdaptiveLasso <- function(par,
                            regularized,
                            weights = NULL,
                            fn,
                            gr = NULL,
                            lambdas = NULL,
                            nLambdas = NULL,
                            reverse = TRUE,
                            curve = 1,
                            ...,
                            method = "glmnet", 
                            control = lessSEM::controlGlmnet()){
  
  removeDotDotDot <- .noDotDotDot(fn, fnName = "fn", ... = ...)
  fn <- removeDotDotDot[[1]]
  additionalArguments <- removeDotDotDot$additionalArguments
  
  if(!is.null(gr)){
    
    removeDotDotDot <- .noDotDotDot(gr, fnName = "gr", ...)
    gr <- removeDotDotDot[[1]]
    additionalArguments <- c(additionalArguments, 
                             removeDotDotDot$additionalArguments[!names(removeDotDotDot$additionalArguments) %in% names(additionalArguments)])
  }
  
  # remove the ... stuff so that it does not interfere with anything else
  rm("...")
  
  if(is.null(weights)){
    weights <- 1/abs(par)
    weights[!names(weights) %in% regularized] <- 0
    cat("\n")
    rlang::inform(c("Note","Building weights based on par as weights = 1/abs(par)."))
  }
  
  if(! all(regularized %in% names(weights))) stop(paste0(
    "You specified that the following parameters should be regularized:\n",
    paste0(regularized, collapse = ", "), 
    ". Not all of these parameters could be found in the model.\n",
    "The model has the following parameters:\n",
    names(weights)
  ))
  
  if(is.null(lambdas) && is.null(nLambdas)){
    stop("Specify either lambdas or nLambdas")
  }
  
  if(!is.null(nLambdas)){
    tuningParameters <- data.frame(nLambdas = nLambdas,
                                   reverse = reverse,
                                   curve = curve)
  }else{
    tuningParameters <- data.frame(lambda = lambdas,
                                   alpha = 1,
                                   theta = 0)
  }
  
  result <- .gpOptimizationInternal(par = par,
                                    fn = fn,
                                    gr = gr,
                                    additionalArguments = additionalArguments,
                                    penalty = "adaptiveLasso", 
                                    weights = weights,
                                    tuningParameters = tuningParameters, 
                                    method = method,
                                    control = control
  )
  
  return(result)
  
}

#' gpRidge
#' 
#' Implements ridge regularization for general purpose optimization problems.
#' The penalty function is given by:
#' \deqn{p( x_j) = \lambda x_j^2}
#' Note that ridge regularization will not set any of the parameters to zero
#' but result in a shrinkage towards zero. 
#' 
#' The interface is similar to that of optim. Users have to supply a vector 
#' with starting values (important: This vector _must_ have labels) and a fitting
#' function. This fitting functions _must_ take a labeled vector with parameter
#' values as first argument. The remaining arguments are passed with the ... argument.
#' This is similar to optim.
#' 
#' The gradient function gr is optional. If set to NULL, the \pkg{numDeriv} package
#' will be used to approximate the gradients. Supplying a gradient function
#' can result in considerable speed improvements.  
#' 
#' Ridge regularization:
#' 
#' * Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation 
#' for Nonorthogonal Problems. Technometrics, 12(1), 55–67. 
#' https://doi.org/10.1080/00401706.1970.10488634
#' 
#' For more details on GLMNET, see:
#' 
#' * Friedman, J., Hastie, T., & Tibshirani, R. (2010). 
#' Regularization Paths for Generalized Linear Models via Coordinate Descent. 
#' Journal of Statistical Software, 33(1), 1–20. https://doi.org/10.18637/jss.v033.i01
#' * Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2010).
#' A Comparison of Optimization Methods and Software for Large-scale 
#' L1-regularized Linear Classification. Journal of Machine Learning Research, 11, 3183–3234.
#' * Yuan, G.-X., Ho, C.-H., & Lin, C.-J. (2012). 
#' An improved GLMNET for l1-regularized logistic regression. 
#' The Journal of Machine Learning Research, 13, 1999–2030. https://doi.org/10.1145/2020408.2020421
#' 
#' For more details on ISTA, see:
#' 
#' * Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding 
#' Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1),
#' 183–202. https://doi.org/10.1137/080716542
#' * Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). 
#' A General Iterative Shrinkage and Thresholding Algorithm for Non-convex 
#' Regularized Optimization Problems. Proceedings of the 30th International 
#' Conference on Machine Learning, 28(2)(2), 37–45.
#' * Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and 
#' Trends in Optimization, 1(3), 123–231.
#'  
#' @param par labeled vector with starting values
#' @param regularized vector with names of parameters which are to be regularized.
#' @param fn R function which takes the parameters as input and returns the 
#' fit value (a single value)
#' @param gr R function which takes the parameters as input and returns the 
#' gradients of the objective function. If set to NULL, numDeriv will be used
#' to approximate the gradients 
#' @param lambdas numeric vector: values for the tuning parameter lambda
#' @param ... additional arguments passed to fn and gr
#' @param method which optimizer should be used? Currently implemented are ista
#' and glmnet. 
#' @param control used to control the optimizer. This element is generated with 
#' the controlIsta and controlGlmnet functions. See ?controlIsta and ?controlGlmnet
#' for more details.
#' @returns Object of class gpRegularized

#' @examples 
#' # This example shows how to use the optimizers
#' # for other objective functions. We will use
#' # a linear regression as an example. Note that
#' # this is not a useful application of the optimizers
#' # as there are specialized packages for linear regression
#' # (e.g., glmnet)
#' 
#' library(lessSEM)
#' set.seed(123)
#' 
#' # first, we simulate data for our
#' # linear regression.
#' N <- 100 # number of persons
#' p <- 10 # number of predictors
#' X <- matrix(rnorm(N*p),	nrow = N, ncol = p) # design matrix
#' b <- c(rep(1,4),
#'        rep(0,6)) # true regression weights
#' y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2)
#' 
#' # First, we must construct a fiting function
#' # which returns a single value. We will use
#' # the residual sum squared as fitting function.
#' 
#' # Let's start setting up the fitting function:
#' fittingFunction <- function(par, y, X, N){
#'   # par is the parameter vector
#'   # y is the observed dependent variable
#'   # X is the design matrix
#'   # N is the sample size
#'   pred <- X %*% matrix(par, ncol = 1) #be explicit here:
#'   # we need par to be a column vector
#'   sse <- sum((y - pred)^2)
#'   # we scale with .5/N to get the same results as glmnet
#'   return((.5/N)*sse)
#' }
#' 
#' # let's define the starting values:
#' b <- c(solve(t(X)%*%X)%*%t(X)%*%y) # we will use the lm estimates
#' names(b) <- paste0("b", 1:length(b))
#' # names of regularized parameters
#' regularized <- paste0("b",1:p)
#' 
#' # optimize
#' ridgePen <- gpRidge(
#'   par = b,
#'   regularized = regularized,
#'   fn = fittingFunction,
#'   lambdas = seq(0,1,.01),
#'   X = X,
#'   y = y,
#'   N = N
#' )
#' plot(ridgePen)
#' 
#' # for comparison:
#' # fittingFunction <- function(par, y, X, N, lambda){
#' #   pred <- X %*% matrix(par, ncol = 1) 
#' #   sse <- sum((y - pred)^2)
#' #   return((.5/N)*sse + lambda * sum(par^2))
#' # }
#' # 
#' # optim(par = b, 
#' #       fn = fittingFunction, 
#' #       y = y,
#' #       X = X,
#' #       N = N,
#' #       lambda =  ridgePen@fits$lambda[20], 
#' #       method = "BFGS")$par
#' # ridgePen@parameters[20,]
#' @export
gpRidge <- function(par,
                    regularized,
                    fn,
                    gr = NULL,
                    lambdas,
                    ...,
                    method = "glmnet", 
                    control = lessSEM::controlGlmnet()){
  
  removeDotDotDot <- .noDotDotDot(fn, fnName = "fn", ... = ...)
  fn <- removeDotDotDot[[1]]
  additionalArguments <- removeDotDotDot$additionalArguments
  
  if(!is.null(gr)){
    
    removeDotDotDot <- .noDotDotDot(gr, fnName = "gr", ...)
    gr <- removeDotDotDot[[1]]
    additionalArguments <- c(additionalArguments, 
                             removeDotDotDot$additionalArguments[!names(removeDotDotDot$additionalArguments) %in% names(additionalArguments)])
  }
  
  # remove the ... stuff so that it does not interfere with anything else
  rm("...")
  
  
  tuningParameters <- data.frame(lambda = lambdas,
                                 alpha = 0,
                                 theta = 0)
  
  result <- .gpOptimizationInternal(par = par,
                                    fn = fn,
                                    gr = gr,
                                    additionalArguments = additionalArguments,
                                    penalty = "ridge", 
                                    weights = regularized,
                                    tuningParameters = tuningParameters, 
                                    method = method,
                                    control = control
  )
  
  return(result)
  
}

#' gpElasticNet 
#' 
#' Implements elastic net regularization for general purpose optimization problems.
#' The penalty function is given by:
#' \deqn{p( x_j) = p( x_j) = \frac{1}{w_j}\lambda| x_j|}
#' Note that the elastic net combines ridge and lasso regularization. If \eqn{\alpha = 0}, 
#' the elastic net reduces to ridge regularization. If \eqn{\alpha = 1} it reduces
#' to lasso regularization. In between, elastic net is a compromise between the shrinkage of
#' the lasso and the ridge penalty. 
#' 
#' The interface is similar to that of optim. Users have to supply a vector 
#' with starting values (important: This vector _must_ have labels) and a fitting
#' function. This fitting functions _must_ take a labeled vector with parameter
#' values as first argument. The remaining arguments are passed with the ... argument.
#' This is similar to optim.
#' 
#' The gradient function gr is optional. If set to NULL, the \pkg{numDeriv} package
#' will be used to approximate the gradients. Supplying a gradient function
#' can result in considerable speed improvements.  
#' 
#' Elastic net regularization:
#' 
#' * Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. 
#' Journal of the Royal Statistical Society: Series B, 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
#'  
#' For more details on GLMNET, see:
#' 
#' * Friedman, J., Hastie, T., & Tibshirani, R. (2010). 
#' Regularization Paths for Generalized Linear Models via Coordinate Descent. 
#' Journal of Statistical Software, 33(1), 1–20. https://doi.org/10.18637/jss.v033.i01
#' * Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2010).
#' A Comparison of Optimization Methods and Software for Large-scale 
#' L1-regularized Linear Classification. Journal of Machine Learning Research, 11, 3183–3234.
#' * Yuan, G.-X., Ho, C.-H., & Lin, C.-J. (2012). 
#' An improved GLMNET for l1-regularized logistic regression. 
#' The Journal of Machine Learning Research, 13, 1999–2030. https://doi.org/10.1145/2020408.2020421
#' 
#' For more details on ISTA, see:
#' 
#' * Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding 
#' Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1),
#' 183–202. https://doi.org/10.1137/080716542
#' * Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). 
#' A General Iterative Shrinkage and Thresholding Algorithm for Non-convex 
#' Regularized Optimization Problems. Proceedings of the 30th International 
#' Conference on Machine Learning, 28(2)(2), 37–45.
#' * Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and 
#' Trends in Optimization, 1(3), 123–231.
#'  
#' @param par labeled vector with starting values
#' @param regularized vector with names of parameters which are to be regularized.
#' @param fn R function which takes the parameters AND their labels 
#' as input and returns the fit value (a single value)
#' @param gr R function which takes the parameters AND their labels
#' as input and returns the gradients of the objective function. 
#' If set to NULL, numDeriv will be used to approximate the gradients 
#' @param lambdas numeric vector: values for the tuning parameter lambda
#' @param alphas numeric vector with values of the tuning parameter alpha. Must be
#' between 0 and 1. 0 = ridge, 1 = lasso.
#' @param regularized vector with names of parameters which are to be regularized.
#' @param ... additional arguments passed to fn and gr
#' @param method which optimizer should be used? Currently implemented are ista
#' and glmnet. 
#' @param control used to control the optimizer. This element is generated with 
#' the controlIsta and controlGlmnet functions. See ?controlIsta and ?controlGlmnet
#' for more details.
#' @returns Object of class gpRegularized
#' @examples
#' # This example shows how to use the optimizers
#' # for other objective functions. We will use
#' # a linear regression as an example. Note that
#' # this is not a useful application of the optimizers
#' # as there are specialized packages for linear regression
#' # (e.g., glmnet)
#' 
#' library(lessSEM)
#' set.seed(123)
#' 
#' # first, we simulate data for our
#' # linear regression.
#' N <- 100 # number of persons
#' p <- 10 # number of predictors
#' X <- matrix(rnorm(N*p),	nrow = N, ncol = p) # design matrix
#' b <- c(rep(1,4),
#'        rep(0,6)) # true regression weights
#' y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2)
#' 
#' # First, we must construct a fiting function
#' # which returns a single value. We will use
#' # the residual sum squared as fitting function.
#' 
#' # Let's start setting up the fitting function:
#' fittingFunction <- function(par, y, X, N){
#'   # par is the parameter vector
#'   # y is the observed dependent variable
#'   # X is the design matrix
#'   # N is the sample size
#'   pred <- X %*% matrix(par, ncol = 1) #be explicit here:
#'   # we need par to be a column vector
#'   sse <- sum((y - pred)^2)
#'   # we scale with .5/N to get the same results as glmnet
#'   return((.5/N)*sse)
#' }
#' 
#' # let's define the starting values:
#' b <- c(solve(t(X)%*%X)%*%t(X)%*%y) # we will use the lm estimates
#' names(b) <- paste0("b", 1:length(b))
#' # names of regularized parameters
#' regularized <- paste0("b",1:p)
#' 
#' # optimize
#' elasticNetPen <- gpElasticNet(
#'   par = b,
#'   regularized = regularized,
#'   fn = fittingFunction,
#'   lambdas = seq(0,1,.1),
#'   alphas = c(0, .5, 1),
#'   X = X,
#'   y = y,
#'   N = N
#' )
#' 
#' # optional: plot requires plotly package
#' # plot(elasticNetPen)
#' 
#' # for comparison:
#' fittingFunction <- function(par, y, X, N, lambda, alpha){
#'   pred <- X %*% matrix(par, ncol = 1)
#'   sse <- sum((y - pred)^2)
#'   return((.5/N)*sse + (1-alpha)*lambda * sum(par^2) + alpha*lambda *sum(sqrt(par^2 + 1e-8)))
#' }
#' 
#' round(
#'   optim(par = b,
#'       fn = fittingFunction,
#'       y = y,
#'       X = X,
#'       N = N,
#'       lambda =  elasticNetPen@fits$lambda[15],
#'       alpha =  elasticNetPen@fits$alpha[15],
#'       method = "BFGS")$par,
#'   4)
#' elasticNetPen@parameters[15,]
#' @export
gpElasticNet <- function(par,
                         regularized,
                         fn,
                         gr = NULL,
                         lambdas,
                         alphas,
                         ...,
                         method = "glmnet", 
                         control = lessSEM::controlGlmnet()){
  
  removeDotDotDot <- .noDotDotDot(fn, fnName = "fn", ... = ...)
  fn <- removeDotDotDot[[1]]
  additionalArguments <- removeDotDotDot$additionalArguments
  
  if(!is.null(gr)){
    
    removeDotDotDot <- .noDotDotDot(gr, fnName = "gr", ...)
    gr <- removeDotDotDot[[1]]
    additionalArguments <- c(additionalArguments, 
                             removeDotDotDot$additionalArguments[!names(removeDotDotDot$additionalArguments) %in% names(additionalArguments)])
  }
  
  # remove the ... stuff so that it does not interfere with anything else
  rm("...")
  
  tuningParameters <- expand.grid(lambda = lambdas,
                                  alpha = alphas,
                                  theta = 0)
  
  
  result <- .gpOptimizationInternal(par = par,
                                    fn = fn,
                                    gr = gr,
                                    additionalArguments = additionalArguments,
                                    penalty = "elasticNet", 
                                    weights = regularized,
                                    tuningParameters = tuningParameters, 
                                    method = method,
                                    control = control
  )
  
  return(result)
}


#' gpCappedL1 
#' 
#' Implements cappedL1 regularization for general purpose optimization problems.
#' The penalty function is given by:
#' \deqn{p( x_j) = \lambda \min(| x_j|, \theta)}
#' where \eqn{\theta > 0}. The cappedL1 penalty is identical to the lasso for 
#' parameters which are below \eqn{\theta} and identical to a constant for parameters
#' above \eqn{\theta}. As adding a constant to the fitting function will not change its
#' minimum, larger parameters can stay unregularized while smaller ones are set to zero.
#' 
#' The interface is similar to that of optim. Users have to supply a vector 
#' with starting values (important: This vector _must_ have labels) and a fitting
#' function. This fitting functions _must_ take a labeled vector with parameter
#' values as first argument. The remaining arguments are passed with the ... argument.
#' This is similar to optim.
#' 
#' The gradient function gr is optional. If set to NULL, the \pkg{numDeriv} package
#' will be used to approximate the gradients. Supplying a gradient function
#' can result in considerable speed improvements.  
#' 
#' CappedL1 regularization:
#' 
#' * Zhang, T. (2010). Analysis of Multi-stage Convex Relaxation for Sparse Regularization. 
#' Journal of Machine Learning Research, 11, 1081–1107.
#'  
#' For more details on GLMNET, see:
#' 
#' * Friedman, J., Hastie, T., & Tibshirani, R. (2010). 
#' Regularization Paths for Generalized Linear Models via Coordinate Descent. 
#' Journal of Statistical Software, 33(1), 1–20. https://doi.org/10.18637/jss.v033.i01
#' * Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2010).
#' A Comparison of Optimization Methods and Software for Large-scale 
#' L1-regularized Linear Classification. Journal of Machine Learning Research, 11, 3183–3234.
#' * Yuan, G.-X., Ho, C.-H., & Lin, C.-J. (2012). 
#' An improved GLMNET for l1-regularized logistic regression. 
#' The Journal of Machine Learning Research, 13, 1999–2030. https://doi.org/10.1145/2020408.2020421
#' 
#' For more details on ISTA, see:
#' 
#' * Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding 
#' Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1),
#' 183–202. https://doi.org/10.1137/080716542
#' * Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). 
#' A General Iterative Shrinkage and Thresholding Algorithm for Non-convex 
#' Regularized Optimization Problems. Proceedings of the 30th International 
#' Conference on Machine Learning, 28(2)(2), 37–45.
#' * Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and 
#' Trends in Optimization, 1(3), 123–231.
#'  
#' @param par labeled vector with starting values
#' @param fn R function which takes the parameters AND their labels 
#' as input and returns the fit value (a single value)
#' @param gr R function which takes the parameters AND their labels
#' as input and returns the gradients of the objective function. 
#' If set to NULL, numDeriv will be used to approximate the gradients 
#' @param ... additional arguments passed to fn and gr
#' @param regularized vector with names of parameters which are to be regularized.
#' @param lambdas numeric vector: values for the tuning parameter lambda
#' @param thetas parameters whose absolute value is above this threshold will be penalized with
#' a constant (theta)
#' @param method which optimizer should be used? Currently implemented are ista
#' and glmnet. 
#' @param control used to control the optimizer. This element is generated with 
#' the controlIsta and controlGlmnet functions. See ?controlIsta and ?controlGlmnet
#' for more details.
#' @returns Object of class gpRegularized

#' @examples 
#' # This example shows how to use the optimizers
#' # for other objective functions. We will use
#' # a linear regression as an example. Note that
#' # this is not a useful application of the optimizers
#' # as there are specialized packages for linear regression
#' # (e.g., glmnet)
#' 
#' # This example shows how to use the optimizers
#' # for other objective functions. We will use
#' # a linear regression as an example. Note that
#' # this is not a useful application of the optimizers
#' # as there are specialized packages for linear regression
#' # (e.g., glmnet)
#' 
#' library(lessSEM)
#' set.seed(123)
#' 
#' # first, we simulate data for our
#' # linear regression.
#' N <- 100 # number of persons
#' p <- 10 # number of predictors
#' X <- matrix(rnorm(N*p),	nrow = N, ncol = p) # design matrix
#' b <- c(rep(1,4),
#'        rep(0,6)) # true regression weights
#' y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2)
#' 
#' # First, we must construct a fiting function
#' # which returns a single value. We will use
#' # the residual sum squared as fitting function.
#' 
#' # Let's start setting up the fitting function:
#' fittingFunction <- function(par, y, X, N){
#'   # par is the parameter vector
#'   # y is the observed dependent variable
#'   # X is the design matrix
#'   # N is the sample size
#'   pred <- X %*% matrix(par, ncol = 1) #be explicit here:
#'   # we need par to be a column vector
#'   sse <- sum((y - pred)^2)
#'   # we scale with .5/N to get the same results as glmnet
#'   return((.5/N)*sse)
#' }
#' 
#' # let's define the starting values:
#' b <- c(solve(t(X)%*%X)%*%t(X)%*%y) # we will use the lm estimates
#' names(b) <- paste0("b", 1:length(b))
#' # names of regularized parameters
#' regularized <- paste0("b",1:p)
#' 
#' # optimize
#' cL1 <- gpCappedL1(
#'   par = b,
#'   regularized = regularized,
#'   fn = fittingFunction,
#'   lambdas = seq(0,1,.1),
#'   thetas = c(0.001, .5, 1),
#'   X = X,
#'   y = y,
#'   N = N
#' )
#' 
#' # optional: plot requires plotly package
#' # plot(cL1)
#' 
#' # for comparison
#' 
#' fittingFunction <- function(par, y, X, N, lambda, theta){
#'   pred <- X %*% matrix(par, ncol = 1)
#'   sse <- sum((y - pred)^2)
#'   smoothAbs <- sqrt(par^2 + 1e-8)
#'   pen <- lambda * ifelse(smoothAbs < theta, smoothAbs, theta)
#'   return((.5/N)*sse + sum(pen))
#' }
#' 
#' round(
#'   optim(par = b,
#'       fn = fittingFunction,
#'       y = y,
#'       X = X,
#'       N = N,
#'       lambda =  cL1@fits$lambda[15],
#'       theta =  cL1@fits$theta[15],
#'       method = "BFGS")$par,
#'   4)
#' cL1@parameters[15,]
#' @export
gpCappedL1 <- function(par,
                       fn,
                       gr = NULL,
                       ...,
                       regularized,
                       lambdas,
                       thetas,
                       method = "glmnet", 
                       control = lessSEM::controlGlmnet()){
  
  removeDotDotDot <- .noDotDotDot(fn, fnName = "fn", ... = ...)
  fn <- removeDotDotDot[[1]]
  additionalArguments <- removeDotDotDot$additionalArguments
  
  if(!is.null(gr)){
    
    removeDotDotDot <- .noDotDotDot(gr, fnName = "gr", ...)
    gr <- removeDotDotDot[[1]]
    additionalArguments <- c(additionalArguments, 
                             removeDotDotDot$additionalArguments[!names(removeDotDotDot$additionalArguments) %in% names(additionalArguments)])
  }
  
  # remove the ... stuff so that it does not interfere with anything else
  rm("...")
  
  if(any(thetas <= 0)) stop("Theta must be > 0")
  
  result <- .gpOptimizationInternal(par = par,
                                    fn = fn,
                                    gr = gr,
                                    additionalArguments = additionalArguments,
                                    penalty = "cappedL1", 
                                    weights = regularized,
                                    tuningParameters = expand.grid(lambda = lambdas, 
                                                                   theta = thetas,
                                                                   alpha = 1), 
                                    method = method,
                                    control = control
  )
  
  return(result)
  
}

#' gpLsp 
#' 
#' Implements lsp regularization for general purpose optimization problems.
#' The penalty function is given by:
#' 
#' The interface is similar to that of optim. Users have to supply a vector 
#' with starting values (important: This vector must have labels) and a fitting
#' function. This fitting functions must take a labeled vector with parameter
#' values as first argument. The remaining arguments are passed with the ... argument.
#' This is similar to optim.
#' 
#' The gradient function gr is optional. If set to NULL, the \pkg{numDeriv} package
#' will be used to approximate the gradients. Supplying a gradient function
#' can result in considerable speed improvements.   
#' 
#' lsp regularization:
#' 
#' * Candès, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing Sparsity by 
#' Reweighted l1 Minimization. Journal of Fourier Analysis and Applications, 14(5–6), 
#' 877–905. https://doi.org/10.1007/s00041-008-9045-x
#'  
#' For more details on GLMNET, see:
#' 
#' * Friedman, J., Hastie, T., & Tibshirani, R. (2010). 
#' Regularization Paths for Generalized Linear Models via Coordinate Descent. 
#' Journal of Statistical Software, 33(1), 1–20. https://doi.org/10.18637/jss.v033.i01
#' * Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2010).
#' A Comparison of Optimization Methods and Software for Large-scale 
#' L1-regularized Linear Classification. Journal of Machine Learning Research, 11, 3183–3234.
#' * Yuan, G.-X., Ho, C.-H., & Lin, C.-J. (2012). 
#' An improved GLMNET for l1-regularized logistic regression. 
#' The Journal of Machine Learning Research, 13, 1999–2030. https://doi.org/10.1145/2020408.2020421
#' 
#' For more details on ISTA, see:
#' 
#' * Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding 
#' Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1),
#' 183–202. https://doi.org/10.1137/080716542
#' * Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). 
#' A General Iterative Shrinkage and Thresholding Algorithm for Non-convex 
#' Regularized Optimization Problems. Proceedings of the 30th International 
#' Conference on Machine Learning, 28(2)(2), 37–45.
#' * Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and 
#' Trends in Optimization, 1(3), 123–231.
#' 
#' @param par labeled vector with starting values
#' @param fn R function which takes the parameters AND their labels 
#' as input and returns the fit value (a single value)
#' @param gr R function which takes the parameters AND their labels
#' as input and returns the gradients of the objective function. 
#' If set to NULL, numDeriv will be used to approximate the gradients 
#' @param ... additional arguments passed to fn and gr
#' @param regularized vector with names of parameters which are to be regularized.
#' @param lambdas numeric vector: values for the tuning parameter lambda
#' @param thetas numeric vector: values for the tuning parameter theta
#' @param method which optimizer should be used? Currently implemented are ista
#' and glmnet. 
#' @param control used to control the optimizer. This element is generated with 
#' the controlIsta and controlGlmnet functions. See ?controlIsta and ?controlGlmnet
#' for more details.
#' @returns Object of class gpRegularized
#' @examples 
#' library(lessSEM)
#' set.seed(123)
#' 
#' # first, we simulate data for our
#' # linear regression.
#' N <- 100 # number of persons
#' p <- 10 # number of predictors
#' X <- matrix(rnorm(N*p),	nrow = N, ncol = p) # design matrix
#' b <- c(rep(1,4),
#'        rep(0,6)) # true regression weights
#' y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2)
#' 
#' # First, we must construct a fiting function
#' # which returns a single value. We will use
#' # the residual sum squared as fitting function.
#' 
#' # Let's start setting up the fitting function:
#' fittingFunction <- function(par, y, X, N){
#'   # par is the parameter vector
#'   # y is the observed dependent variable
#'   # X is the design matrix
#'   # N is the sample size
#'   pred <- X %*% matrix(par, ncol = 1) #be explicit here:
#'   # we need par to be a column vector
#'   sse <- sum((y - pred)^2)
#'   # we scale with .5/N to get the same results as glmnet
#'   return((.5/N)*sse)
#' }
#' 
#' # let's define the starting values:
#' b <- c(solve(t(X)%*%X)%*%t(X)%*%y) # we will use the lm estimates
#' names(b) <- paste0("b", 1:length(b))
#' # names of regularized parameters
#' regularized <- paste0("b",1:p)
#' 
#' # optimize
#' lspPen <- gpLsp(
#'   par = b,
#'   regularized = regularized,
#'   fn = fittingFunction,
#'   lambdas = seq(0,1,.1),
#'   thetas = c(0.001, .5, 1),
#'   X = X,
#'   y = y,
#'   N = N
#' )
#' 
#' # optional: plot requires plotly package
#' # plot(lspPen)
#' 
#' # for comparison
#' 
#' fittingFunction <- function(par, y, X, N, lambda, theta){
#'   pred <- X %*% matrix(par, ncol = 1)
#'   sse <- sum((y - pred)^2)
#'   smoothAbs <- sqrt(par^2 + 1e-8)
#'   pen <- lambda * log(1.0 + smoothAbs / theta)
#'   return((.5/N)*sse + sum(pen))
#' }
#' 
#' round(
#'   optim(par = b,
#'       fn = fittingFunction,
#'       y = y,
#'       X = X,
#'       N = N,
#'       lambda =  lspPen@fits$lambda[15],
#'       theta =  lspPen@fits$theta[15],
#'       method = "BFGS")$par,
#'   4)
#' lspPen@parameters[15,]
#' @export
gpLsp <- function(par,
                  fn,
                  gr = NULL,
                  ...,
                  regularized,
                  lambdas,
                  thetas,
                  method = "glmnet", 
                  control = lessSEM::controlGlmnet()){
  removeDotDotDot <- .noDotDotDot(fn, fnName = "fn", ... = ...)
  fn <- removeDotDotDot[[1]]
  additionalArguments <- removeDotDotDot$additionalArguments
  
  if(!is.null(gr)){
    
    removeDotDotDot <- .noDotDotDot(gr, fnName = "gr", ...)
    gr <- removeDotDotDot[[1]]
    additionalArguments <- c(additionalArguments, 
                             removeDotDotDot$additionalArguments[!names(removeDotDotDot$additionalArguments) %in% names(additionalArguments)])
  }
  
  # remove the ... stuff so that it does not interfere with anything else
  rm("...")
  
  if(any(thetas <= 0)) stop("Theta must be > 0")
  
  result <- .gpOptimizationInternal(par = par,
                                    fn = fn,
                                    gr = gr,
                                    additionalArguments = additionalArguments,
                                    penalty = "lsp", 
                                    weights = regularized,
                                    tuningParameters = expand.grid(lambda = lambdas, 
                                                                   theta = thetas,
                                                                   alpha = 1), 
                                    method = method,
                                    control = control
  )
  
  return(result)
  
}

#' gpMcp 
#' 
#' Implements mcp regularization for general purpose optimization problems.
#' The penalty function is given by:
#' \ifelse{html}{\deqn{p( x_j) = \begin{cases}
#' \lambda |x_j| - x_j^2/(2\theta) & \text{if } |x_j| \leq \theta\lambda\\
#' \theta\lambda^2/2 & \text{if } |x_j| > \lambda\theta
#' \end{cases}} where \eqn{\theta > 0}.}{
#' Equation Omitted in Pdf Documentation.}
#' 
#' The interface is similar to that of optim. Users have to supply a vector 
#' with starting values (important: This vector _must_ have labels) and a fitting
#' function. This fitting functions _must_ take a labeled vector with parameter
#' values as first argument. The remaining arguments are passed with the ... argument.
#' This is similar to optim.
#' 
#' The gradient function gr is optional. If set to NULL, the \pkg{numDeriv} package
#' will be used to approximate the gradients. Supplying a gradient function
#' can result in considerable speed improvements.  
#' 
#' mcp regularization:
#' 
#' * Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. 
#' The Annals of Statistics, 38(2), 894–942. https://doi.org/10.1214/09-AOS729
#'  
#' For more details on GLMNET, see:
#' 
#' * Friedman, J., Hastie, T., & Tibshirani, R. (2010). 
#' Regularization Paths for Generalized Linear Models via Coordinate Descent. 
#' Journal of Statistical Software, 33(1), 1–20. https://doi.org/10.18637/jss.v033.i01
#' * Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2010).
#' A Comparison of Optimization Methods and Software for Large-scale 
#' L1-regularized Linear Classification. Journal of Machine Learning Research, 11, 3183–3234.
#' * Yuan, G.-X., Ho, C.-H., & Lin, C.-J. (2012). 
#' An improved GLMNET for l1-regularized logistic regression. 
#' The Journal of Machine Learning Research, 13, 1999–2030. https://doi.org/10.1145/2020408.2020421
#' 
#' For more details on ISTA, see:
#' 
#' * Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding 
#' Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1),
#' 183–202. https://doi.org/10.1137/080716542
#' * Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). 
#' A General Iterative Shrinkage and Thresholding Algorithm for Non-convex 
#' Regularized Optimization Problems. Proceedings of the 30th International 
#' Conference on Machine Learning, 28(2)(2), 37–45.
#' * Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and 
#' Trends in Optimization, 1(3), 123–231.
#' 
#' @param par labeled vector with starting values
#' @param fn R function which takes the parameters AND their labels 
#' as input and returns the fit value (a single value)
#' @param gr R function which takes the parameters AND their labels
#' as input and returns the gradients of the objective function. 
#' If set to NULL, numDeriv will be used to approximate the gradients 
#' @param ... additional arguments passed to fn and gr
#' @param regularized vector with names of parameters which are to be regularized.
#' @param lambdas numeric vector: values for the tuning parameter lambda
#' @param thetas numeric vector: values for the tuning parameter theta
#' @param method which optimizer should be used? Currently implemented are ista
#' and glmnet. 
#' @param control used to control the optimizer. This element is generated with 
#' the controlIsta and controlGlmnet functions. See ?controlIsta and ?controlGlmnet
#' for more details.
#' @returns Object of class gpRegularized

#' @examples 
#' # This example shows how to use the optimizers
#' # for other objective functions. We will use
#' # a linear regression as an example. Note that
#' # this is not a useful application of the optimizers
#' # as there are specialized packages for linear regression
#' # (e.g., glmnet)
#' 
#' library(lessSEM)
#' set.seed(123)
#' 
#' # first, we simulate data for our
#' # linear regression.
#' N <- 100 # number of persons
#' p <- 10 # number of predictors
#' X <- matrix(rnorm(N*p),	nrow = N, ncol = p) # design matrix
#' b <- c(rep(1,4),
#'        rep(0,6)) # true regression weights
#' y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2)
#' 
#' # First, we must construct a fiting function
#' # which returns a single value. We will use
#' # the residual sum squared as fitting function.
#' 
#' # Let's start setting up the fitting function:
#' fittingFunction <- function(par, y, X, N){
#'   # par is the parameter vector
#'   # y is the observed dependent variable
#'   # X is the design matrix
#'   # N is the sample size
#'   pred <- X %*% matrix(par, ncol = 1) #be explicit here:
#'   # we need par to be a column vector
#'   sse <- sum((y - pred)^2)
#'   # we scale with .5/N to get the same results as glmnet
#'   return((.5/N)*sse)
#' }
#' 
#' # let's define the starting values:
#' # first, let's add an intercept
#' X <- cbind(1, X)
#' 
#' b <- c(solve(t(X)%*%X)%*%t(X)%*%y) # we will use the lm estimates
#' names(b) <- paste0("b", 0:(length(b)-1))
#' # names of regularized parameters
#' regularized <- paste0("b",1:p)
#' 
#' # optimize
#' mcpPen <- gpMcp(
#'   par = b,
#'   regularized = regularized,
#'   fn = fittingFunction,
#'   lambdas = seq(0,1,.1),
#'   thetas = c(1.001, 1.5, 2),
#'   X = X,
#'   y = y,
#'   N = N
#' )
#' 
#' # optional: plot requires plotly package
#' # plot(mcpPen)
#' 
#' @export
gpMcp <- function(par,
                  fn,
                  gr = NULL,
                  ...,
                  regularized,
                  lambdas,
                  thetas,
                  method = "glmnet", 
                  control = lessSEM::controlGlmnet()){
  
  removeDotDotDot <- .noDotDotDot(fn, fnName = "fn", ... = ...)
  fn <- removeDotDotDot[[1]]
  additionalArguments <- removeDotDotDot$additionalArguments
  
  if(!is.null(gr)){
    
    removeDotDotDot <- .noDotDotDot(gr, fnName = "gr", ...)
    gr <- removeDotDotDot[[1]]
    additionalArguments <- c(additionalArguments, 
                             removeDotDotDot$additionalArguments[!names(removeDotDotDot$additionalArguments) %in% names(additionalArguments)])
  }
  
  # remove the ... stuff so that it does not interfere with anything else
  rm("...")
  
  if(any(thetas <= 0)) stop("Theta must be > 0")
  
  result <- .gpOptimizationInternal(par = par,
                                    fn = fn,
                                    gr = gr,
                                    additionalArguments = additionalArguments,
                                    penalty = "mcp", 
                                    weights = regularized,
                                    tuningParameters = expand.grid(lambda = lambdas, 
                                                                   theta = thetas,
                                                                   alpha = 1), 
                                    method = method,
                                    control = control
  )
  
  return(result)
  
}

#' gpScad 
#' 
#' Implements scad regularization for general purpose optimization problems.
#' The penalty function is given by:
#' \ifelse{html}{
#' \deqn{p( x_j) = \begin{cases}
#' \lambda |x_j| & \text{if } |x_j| \leq \theta\\
#' \frac{-x_j^2 + 2\theta\lambda |x_j| - \lambda^2}{2(\theta -1)} & 
#' \text{if } \lambda < |x_j| \leq \lambda\theta \\
#' (\theta + 1) \lambda^2/2 & \text{if } |x_j| \geq \theta\lambda\\
#' \end{cases}}
#' where \eqn{\theta > 2}.}{
#' Equation Omitted in Pdf Documentation.
#' }  
#' 
#' The interface is similar to that of optim. Users have to supply a vector 
#' with starting values (important: This vector _must_ have labels) and a fitting
#' function. This fitting functions _must_ take a labeled vector with parameter
#' values as first argument. The remaining arguments are passed with the ... argument.
#' This is similar to optim.
#' 
#' The gradient function gr is optional. If set to NULL, the \pkg{numDeriv} package
#' will be used to approximate the gradients. Supplying a gradient function
#' can result in considerable speed improvements.  
#' 
#' scad regularization:
#' 
#' * Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized 
#' likelihood and its oracle properties. Journal of the American Statistical Association, 
#' 96(456), 1348–1360. https://doi.org/10.1198/016214501753382273
#'  
#' For more details on GLMNET, see:
#' 
#' * Friedman, J., Hastie, T., & Tibshirani, R. (2010). 
#' Regularization Paths for Generalized Linear Models via Coordinate Descent. 
#' Journal of Statistical Software, 33(1), 1–20. https://doi.org/10.18637/jss.v033.i01
#' * Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2010).
#' A Comparison of Optimization Methods and Software for Large-scale 
#' L1-regularized Linear Classification. Journal of Machine Learning Research, 11, 3183–3234.
#' * Yuan, G.-X., Ho, C.-H., & Lin, C.-J. (2012). 
#' An improved GLMNET for l1-regularized logistic regression. 
#' The Journal of Machine Learning Research, 13, 1999–2030. https://doi.org/10.1145/2020408.2020421
#' 
#' For more details on ISTA, see:
#' 
#' * Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding 
#' Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1),
#' 183–202. https://doi.org/10.1137/080716542
#' * Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). 
#' A General Iterative Shrinkage and Thresholding Algorithm for Non-convex 
#' Regularized Optimization Problems. Proceedings of the 30th International 
#' Conference on Machine Learning, 28(2)(2), 37–45.
#' * Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and 
#' Trends in Optimization, 1(3), 123–231.
#'  
#' @param par labeled vector with starting values
#' @param fn R function which takes the parameters AND their labels 
#' as input and returns the fit value (a single value)
#' @param gr R function which takes the parameters AND their labels
#' as input and returns the gradients of the objective function. 
#' If set to NULL, numDeriv will be used to approximate the gradients 
#' @param ... additional arguments passed to fn and gr
#' @param regularized vector with names of parameters which are to be regularized.
#' @param lambdas numeric vector: values for the tuning parameter lambda
#' @param thetas numeric vector: values for the tuning parameter theta
#' @param method which optimizer should be used? Currently implemented are ista
#' and glmnet. 
#' @param control used to control the optimizer. This element is generated with 
#' the controlIsta and controlGlmnet functions. See ?controlIsta and ?controlGlmnet
#' for more details.
#' @returns Object of class gpRegularized

#' @examples 
#' # This example shows how to use the optimizers
#' # for other objective functions. We will use
#' # a linear regression as an example. Note that
#' # this is not a useful application of the optimizers
#' # as there are specialized packages for linear regression
#' # (e.g., glmnet)
#' 
#' library(lessSEM)
#' set.seed(123)
#' 
#' # first, we simulate data for our
#' # linear regression.
#' N <- 100 # number of persons
#' p <- 10 # number of predictors
#' X <- matrix(rnorm(N*p),	nrow = N, ncol = p) # design matrix
#' b <- c(rep(1,4),
#'        rep(0,6)) # true regression weights
#' y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2)
#' 
#' # First, we must construct a fiting function
#' # which returns a single value. We will use
#' # the residual sum squared as fitting function.
#' 
#' # Let's start setting up the fitting function:
#' fittingFunction <- function(par, y, X, N){
#'   # par is the parameter vector
#'   # y is the observed dependent variable
#'   # X is the design matrix
#'   # N is the sample size
#'   pred <- X %*% matrix(par, ncol = 1) #be explicit here:
#'   # we need par to be a column vector
#'   sse <- sum((y - pred)^2)
#'   # we scale with .5/N to get the same results as glmnet
#'   return((.5/N)*sse)
#' }
#' 
#' # let's define the starting values:
#' # first, let's add an intercept
#' X <- cbind(1, X)
#' 
#' b <- c(solve(t(X)%*%X)%*%t(X)%*%y) # we will use the lm estimates
#' names(b) <- paste0("b", 0:(length(b)-1))
#' # names of regularized parameters
#' regularized <- paste0("b",1:p)
#' 
#' # optimize
#' scadPen <- gpScad(
#'   par = b,
#'   regularized = regularized,
#'   fn = fittingFunction,
#'   lambdas = seq(0,1,.1),
#'   thetas = c(2.001, 2.5, 5),
#'   X = X,
#'   y = y,
#'   N = N
#' )
#' 
#' # optional: plot requires plotly package
#' # plot(scadPen)
#' 
#' # for comparison
#' #library(ncvreg)
#' #scadFit <- ncvreg(X = X[,-1], 
#' #                  y = y, 
#' #                  penalty = "SCAD",
#' #                  lambda =  scadPen@fits$lambda[15],
#' #                  gamma =  scadPen@fits$theta[15])
#' #coef(scadFit)
#' #scadPen@parameters[15,]
#' @export
gpScad <- function(par,
                   fn,
                   gr = NULL,
                   ...,
                   regularized,
                   lambdas,
                   thetas,
                   method = "glmnet", 
                   control = lessSEM::controlGlmnet()){
  
  removeDotDotDot <- .noDotDotDot(fn, fnName = "fn", ... = ...)
  fn <- removeDotDotDot[[1]]
  additionalArguments <- removeDotDotDot$additionalArguments
  
  if(!is.null(gr)){
    
    removeDotDotDot <- .noDotDotDot(gr, fnName = "gr", ...)
    gr <- removeDotDotDot[[1]]
    additionalArguments <- c(additionalArguments, 
                             removeDotDotDot$additionalArguments[!names(removeDotDotDot$additionalArguments) %in% names(additionalArguments)])
  }
  
  # remove the ... stuff so that it does not interfere with anything else
  rm("...")
  
  if(any(thetas <= 2)) stop("Theta must be > 2")
  
  result <- .gpOptimizationInternal(par = par,
                                    fn = fn,
                                    gr = gr,
                                    additionalArguments = additionalArguments,
                                    penalty = "scad", 
                                    weights = regularized,
                                    tuningParameters = expand.grid(lambda = lambdas, 
                                                                   theta = thetas,
                                                                   alpha = 1), 
                                    method = method,
                                    control = control
  )
  
  return(result)
}
Any scripts or data that you put into this service are public.
lessSEM documentation built on May 29, 2024, 7:10 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
lessSEM
Non-Smooth Regularization for Structural Equation Models

R/gpOptimizationInterface.R
In lessSEM: Non-Smooth Regularization for Structural Equation Models

Defines functions gpScad gpMcp gpLsp gpCappedL1 gpElasticNet gpRidge gpAdaptiveLasso gpLasso

Documented in gpAdaptiveLasso gpCappedL1 gpElasticNet gpLasso gpLsp gpMcp gpRidge gpScad

Try the lessSEM package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

lessSEM Non-Smooth Regularization for Structural Equation Models

R/gpOptimizationInterface.R In lessSEM: Non-Smooth Regularization for Structural Equation Models

Defines functions gpScad gpMcp gpLsp gpCappedL1 gpElasticNet gpRidge gpAdaptiveLasso gpLasso

Documented in gpAdaptiveLasso gpCappedL1 gpElasticNet gpLasso gpLsp gpMcp gpRidge gpScad

Try the lessSEM package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

lessSEM
Non-Smooth Regularization for Structural Equation Models

R/gpOptimizationInterface.R
In lessSEM: Non-Smooth Regularization for Structural Equation Models