R/SieveSGDFunction.R
In Sieve: Nonparametric Estimation by the Method of Sieves

Documented in clean_up_result sieve.sgd.predict sieve.sgd.preprocess sieve.sgd.solver

#' Preprocess the original data for sieve-SGD estimation.
#' 
#' @param X a data frame containing prediction features/ independent variables. The (i,j)-th element is the j-th dimension of the i-th sample's feature vector. 
#' So the number of rows equals to the sample size and the number of columns equals to the feature/covariate dimension. If the complete data set is large, this can be a representative subset of it (ideally have more than 1000 samples). 
#' @param s numerical array. Smoothness parameter, a smaller s corresponds to a more flexible model. Default is 2. The elements of this array should take values greater than 0.5. The larger s is, the smoother we are assuming the truth to be.
#' @param r0 numerical array. Initial learning rate/step size, don't set it too large. The step size at each iteration will be r0*(sample size)^(-1/(2s+1)), which is slowly decaying.
#' @param J numerical array. Initial number of basis functions, a larger J corresponds to a more flexible estimator The number of basis functions at each iteration will be J*(sample size)^(1/(2s+1)), which is slowly increasing. We recommend use J that is at least the dimension of predictor, i.e. the column number of the X matrix.
#' @param type a string. It specifies what kind of basis functions are used. The default is (aperiodic) cosine basis functions ('cosine'), which is enough for generic usage.
#' @param interaction_order a number. It also controls the model complexity. 1 means fitting an additive model, 2 means fitting a model allows, 3 means interaction terms between 3 dimensions of the feature, etc. The default is 3. 
#' For large sample size, lower dimension problems, try a larger value (but need to be smaller than the dimension of original features); for smaller sample size and higher dimensional problems, try set it to a smaller value (1 or 2).
#' @param omega the rate of dimension-reduction parameter. Default is 0.51, usually do not need to change.
#' @param norm_feature a logical variable. Default is TRUE. It means sieve_preprocess will rescale the each dimension of features to 0 and 1. Only set to FALSE when user already manually rescale them between 0 and 1.
#' @param norm_para a matrix. It specifies how the features are normalized. For training data, use the default value NULL.
#' @param lower_q lower quantile used in normalization. Default is 0.01 (1\% quantile).
#' @param upper_q upper quantile used in normalization. Default is 0.99 (99\% quantile).
#' 
#' @return A list containing the necessary information for next step model fitting. Typically, the list is used as the main input of sieve.sgd.solver.
#' \item{s.size.sofar}{a number. Number of samples has been processed so far.}
#' \item{type}{a string. The type of basis funtion.}
#' \item{hyper.para.list}{a list of hyperparameters.}
#' \item{index.matrix}{a matrix. Identifies the multivariate basis functions used in fitting.}
#' \item{index.row.prod}{the index product for each basis function. It is used in calculating basis function - specific learning rates.}
#' \item{inf.list}{a list storing the fitted results. It has a length of "number of unique combinations of the hyperparameters". The component of inf.list is itself a list, it has a hyper.para.index domain to specify its corresponding hyperparameters (need to be used together with hyper.para.list). Its rolling.cv domain is the progressive validation statistics for hyperparameter tuning; beta.f is the regression coefficients for the first length(beta.f) basis functions, the rest of the basis have 0 coefficients.}
#' \item{norm_para}{a matrix. It records how each dimension of the feature/predictor is rescaled, which is useful when rescaling the testing sample's predictors.}

#' @examples 
#' xdim <- 1 #1 dimensional feature
#' #generate 1000 training samples
#' TrainData <- GenSamples(s.size = 1000, xdim = xdim)
#' sieve.model <- sieve.sgd.preprocess(X = TrainData[,2:(xdim+1)])
#' @export
#'
sieve.sgd.preprocess <- function(X, s = c(2), r0 = c(2), J = c(1), type = c('cosine'), 
                             interaction_order = c(3), omega = c(0.51),
                             norm_feature = TRUE, norm_para = NULL,
                             lower_q = 0.005, upper_q = 0.995){
  
  #s.size is sample size
  if(is(X,'numeric') | is(X, 'factor')){
    s.size <- length(X) #this is a univariate regression problem, sometimes the input data.frame is automatically conveted to an array
  }
  else{
    s.size <- dim(X)[1]
  }
  
  
  X <- sapply(X,unclass) #convert categorical variables into numeric variables
  X <- matrix(X, nrow = s.size)
  
  if(norm_feature){
    norm_list <- normalize_X(X, norm_para = norm_para, 
                                     lower_q = lower_q, upper_q = upper_q)
    # X <- norm_list$X
    norm_para <- norm_list$norm_para
  }
  
  #####set the number of observations counter back to 0
  s.size.sofar <- 0
  
  #####hyperparameter list
  hyper.para.list <- list(s = s, r0 = r0,
                          J = J, interaction_order = interaction_order,
                          omega = omega) #omega is related to a basis function-specific learning rate
  
  #####generate hyperparameter index matrix
  hyper.para.index.matrix <- expand.grid(1:length(s), 1:length(r0), 1:length(J),
                                         1:length(interaction_order), 1:length(omega))
  colnames(hyper.para.index.matrix) <- c('s', 'r0', 'J', 'interaction_order', 'omega')
  
  #####generate basis index matrix
  max.basisN <- ceiling( max(J) * s.size^(1/(2*min(s) + 1))
                        ) #the maximum number of basis functino needed to process all the data in X
  xdim <- dim(X)[2] #dimension of predictors
  index_matrix <- as.matrix(create_index_matrix(xdim, basisN = max.basisN, #the dimension of the index_matrix is specified by D (not xdim)
                                                interaction_order = max(interaction_order))[1:max.basisN,], 
                            ncol = xdim)
  index.row.prod <- index_matrix[, 1]#index product will be used when determining the basis function-specific learning rate
  index_matrix <- as.matrix(index_matrix[, -1], ncol = xdim)
  
  ####generate the list storing fitted beta and rolling cross-validation stat
  M <- dim(hyper.para.index.matrix)[1] #number of hyperparameter combinations
  inf.list <- vector("list", M) #for each combination of hyperparameter, create a list for storing the estiamted beta,etc.
  for(i in 1:M){
    tmp.inf <- list(hyper.para.index = hyper.para.index.matrix[i,],
                    rolling.cv = 0,
                    beta.f.int = 0, #beta of the internal fhat
                    beta.f = 0) #this is the estimator's beta  
    inf.list[[i]] <- tmp.inf
  }
  
  return(list(s.size.sofar = s.size.sofar,
              type = type,
              hyper.para.list = hyper.para.list,
              index.matrix = index_matrix,
              index.row.prod = index.row.prod,
              inf.list = inf.list,
              norm.para = norm_para))
}

#' Fit sieve-SGD estimators, using progressive validation for hyperparameter tuning.
#' 
#' @param sieve.model a list initiated using sieve.sgd.preprocess. Check the documentation of sieve.sgd.preprocess for more information.
#' @param X a data frame containing prediction features/ independent variables. 
#' @param Y training outcome.
#' @param cv_weight_rate this governs the divergence rate of rolling validation statistics. Default is set to be 1 and in general does not need to be changed.
#' 
#' @return A list. It contains the fitted regression coefficients and progressive validation statistics for each hyperparameter combination.
#' \item{s.size.sofar}{a number. Number of samples has been processed so far.}
#' \item{type}{a string. The type of basis funtion.}
#' \item{hyper.para.list}{a list of hyperparameters.}
#' \item{index.matrix}{a matrix. Identifies the multivariate basis functions used in fitting.}
#' \item{index.row.prod}{the index product for each basis function. It is used in calculating basis function - specific learning rates.}
#' \item{inf.list}{a list storing the fitted results. It has a length of "number of unique combinations of the hyperparameters". The component of inf.list is itself a list, it has a hyper.para.index domain to specify its corresponding hyperparameters (need to be used together with hyper.para.list). Its rolling.cv domain is the progressive validation statistics for hyperparameter tuning; beta.f is the regression coefficients for the first length(beta.f) basis functions, the rest of the basis have 0 coefficients.}
#' \item{norm_para}{a matrix. It records how each dimension of the feature/predictor is rescaled, which is useful when rescaling the testing sample's predictors.}

#' @examples 
#' frho.para <- xdim <- 1 ##predictor dimension
#' frho <- 'additive' ###truth is a sum of absolute functions 
#' type <- 'cosine' ###use cosine functions as the basis functions
#' #generate training data
#' TrainData <- GenSamples(s.size = 1e3, xdim = xdim, 
#'                                 frho.para = frho.para, 
#'                                 frho = frho, noise.para = 0.1)
#' #preprocess the model
#' sieve.model <- sieve.sgd.preprocess(X = TrainData[,2:(xdim+1)], 
#'                                     type = type,
#'                                     s = c(1,2),
#'                                     r0 = c(0.5, 2, 4),
#'                                     J = c(1, 4, 8))
#' 
#' ##train the model
#' sieve.model <- sieve.sgd.solver(sieve.model = sieve.model, 
#'                                 X = TrainData[,2:(xdim+1)], 
#'                                 Y  = TrainData[,1])
#' 
#' ##sieve-SGD can do multiple passes over the data, just like other SGD methods.
#' ##usually a second pass can still improve the prediction accuracy
#' ##watch out overfitting when performing multiple passes!
#' sieve.model <- sieve.sgd.solver(sieve.model = sieve.model, 
#'                               X = TrainData[,2:(xdim+1)], 
#'                               Y  = TrainData[,1])

#' @export
#' 
sieve.sgd.solver <- function(sieve.model, X, Y,
                             cv_weight_rate = 1){
  ####this function process the training X,Y using sieve-SGD
  ####also it uses rolling cross-validation to choose hyper-parameters
  
  #####normalize the X. Make sure the format of everything is correct.
  #s.size is sample size
  if(is(X,'numeric') | is(X, 'factor')){
    s.size <- length(X) #this is a univariate regression problem, sometimes the input data.frame is automatically conveted to an array
  }else{
    s.size <- dim(X)[1]
  }
  
  X <- sapply(X,unclass) #convert categorical variables into numeric variables
  X <- matrix(X, nrow = s.size)
  
  norm_para <- sieve.model$norm.para
  if(is.null(norm_para)){
    warning('there is no normalization parameter for X, 
            please use sieve.sgd.preprocess to preprocess the data.')  
  }
  
  norm_list <- normalize_X(X, norm_para = norm_para)
  X <- norm_list$X
  
  #####feed in the data one by one
  J <- sieve.model$hyper.para.list$J
  s <- sieve.model$hyper.para.list$s
  r0 <- sieve.model$hyper.para.list$r0
  omega <- sieve.model$hyper.para.list$omega
  interaction_order <- sieve.model$hyper.para.list$interaction_order
  type <- sieve.model$type
  M <- length(sieve.model$inf.list)
  s.size.sofar <- sieve.model$s.size.sofar
  
  if(s.size.sofar == 0){ #first time process the data
    index_matrix <- sieve.model$index.matrix
    index.row.prod <- sieve.model$index.row.prod
  }else{
    print('not training from the beginning')  
    
    #####regenerate basis index matrix
    max.basisN <- ceiling( max(J) * (s.size + s.size.sofar)^(1/(2*min(s) + 1))
    ) #the maximum number of basis functino needed to process all the data in X
    xdim <- dim(X)[2] #dimension of predictors
    index_matrix <- as.matrix(create_index_matrix(xdim, basisN = max.basisN, #the dimension of the index_matrix is specified by D (not xdim)
                                                          interaction_order = max(interaction_order))[1:max.basisN,], 
                              ncol = xdim)
    index.row.prod <- index_matrix[, 1]#index product will be used when determining the basis function-specific learning rate
    index_matrix <- as.matrix(index_matrix[, -1], ncol = xdim)
    
    
    sieve.model$index.matrix <- index_matrix
    sieve.model$index.row.prod <- index.row.prod
  }
  
  for(i in 1:s.size){
    
    i.sofar <- i + s.size.sofar #first pass, i.sofar = i
    
    newx <- matrix(X[i,], nrow = 1)
    newy <- Y[i]
    
    max.J <- ceiling(max(J) * (i.sofar)^(1/(2*min(s) + 1))) #this is the maximum number of basis functions need to estimated at this step
    
    Phi <- Design_M_C(newx, max.J, type, index_matrix) #one row design "matrix"
    
    for(m in 1:M){
      
      #####calculate rolling CV
      
      ###number of basis functions for this hyper-para combination
      J.m <- J[sieve.model$inf.list[[m]]$hyper.para.index$J] * (i.sofar)^(1/(2* s[sieve.model$inf.list[[m]]$hyper.para.index$s]+ 1))
      J.m <- ceiling(J.m)
      
      beta.f <- sieve.model$inf.list[[m]]$beta.f
      beta.f.int <- sieve.model$inf.list[[m]]$beta.f.int
      
      if(length(beta.f) < J.m){ #check if more basis are used
        beta.f <- c(beta.f, rep(0, J.m - length(beta.f)))
        beta.f.int <- c(beta.f.int, rep(0, J.m - length(beta.f.int)))
      }
      
      fnewx <- crossprod(beta.f, Phi[1:J.m]) ###f_i(X_{i+1})
      sieve.model$inf.list[[m]]$rolling.cv <- sieve.model$inf.list[[m]]$rolling.cv + i.sofar^(cv_weight_rate) * (newy - fnewx)^2
      
      ##########update beta's
      ###overall learning rate
      rn.m <- r0[sieve.model$inf.list[[m]]$hyper.para.index$r0] * (i.sofar)^(-1/(2* s[sieve.model$inf.list[[m]]$hyper.para.index$s]+ 1))
      
      #####CORE UPDATING STEPS
      fnewx.int <- crossprod(beta.f.int, Phi[1:J.m])
      res <- as.numeric(newy - fnewx.int)
      
      beta.f.int <- beta.f.int + rn.m * res * (index.row.prod[1:J.m])^(-2*omega)*Phi[1:J.m]
      
      beta.f <- (i.sofar-1)/i.sofar * beta.f + beta.f.int/i.sofar
      ################
      
      sieve.model$inf.list[[m]]$beta.f <- beta.f
      sieve.model$inf.list[[m]]$beta.f.int <- beta.f.int
      
    }
  }
  #update the current number of sample processed 
  sieve.model$s.size.sofar <- s.size.sofar + s.size
  
  # ######model comparison
  # rolling.cvs <- rep(1e10, M)
  # for(m in 1:M){
  #   rolling.cvs[m] <- sieve.model$inf.list[[m]]$rolling.cv
  # }
  # print(rolling.cvs/sieve.model$s.size.sofar)
  # print(which.min(rolling.cvs))
  sieve.model <- clean_up_result(sieve.model)
  return(sieve.model)

}

# xdim <- 2
# frho <- 'lineartensor'
# frho <- 'additive'
# TrainData <- GenSamples(s.size = 1e4, xdim = xdim, frho.para = 1, frho = frho, noise.para = 0.01)
# ###noise.para = 0.17 for lineartensor, snr 30, xdim = D = 4; =0.54, snr = 3
# X <- TrainData[,2:(xdim+1)]
# Y <- TrainData[,1]
# sieve.model <- sieve.sgd.preprocess(X = TrainData[,2:(xdim+1)], type = type,
#                                     s = c(1,2),
#                                     r0 = c(0.5, 2, 4),
#                                     J = c(0.5, 2, 4))
# sieve.model = sieve.sgd.solver(sieve.model = sieve.model, X = X, Y  = Y)
# sieve.model$inf.list[[13]]$hyper.para.index
# sieve.model$inf.list[[2]]
# sieve.model$inf.list[[4]]

# TestData <- GenSamples(s.size = 5e3, xdim = xdim, frho.para = 1, frho = frho, noise.para = 0)
# # X <- TestData[,2:(xdim+1)]
# # Y <- TestData[,1]
# sieve.model <- sieve.sgd.predict(sieve.model, TestData[,2:(xdim+1)])
# 
# mean((TestData[,1] - sieve.model$inf.list[[9]]$prdy)^2)
# mean((Y - sieve.model$inf.list[[15]]$prdy)^2)
# mean((Y - sieve.model$inf.list[[6]]$prdy)^2)
# plot(X, sieve.model$inf.list[[15]]$prdy)


#' Sieve-SGD makes prediction with new predictors.
#' 
#' @param sieve.model a list initiated using sieve.sgd.preprocess and sieve.sgd.solver. Check the documentation of sieve.sgd.preprocess for more information.
#' @param X a data frame containing prediction features/ independent variables. 

#' @return sieve.sgd.predict will update the given sieve.model input list.
#' \item{inf.list}{In each entry of the list inf.list, the array prdy is the predicted outcome under the given hyperparameter combination.}

#' @examples 
#' frho.para <- xdim <- 1 ##predictor dimension
#' frho <- 'additive' ###truth is a sum of absolute functions 
#' type <- 'cosine' ###use cosine functions as the basis functions
#' #generate training data
#' TrainData <- GenSamples(s.size = 1e3, xdim = xdim, 
#'                                 frho.para = frho.para, 
#'                                 frho = frho, noise.para = 0.1)
#' #preprocess the model
#' sieve.model <- sieve.sgd.preprocess(X = TrainData[,2:(xdim+1)], 
#'                                     type = type,
#'                                     s = c(1,2),
#'                                     r0 = c(0.5, 2, 4),
#'                                     J = c(1, 4, 8))
#' 
#' ##train the model
#' sieve.model <- sieve.sgd.solver(sieve.model = sieve.model, 
#'                                 X = TrainData[,2:(xdim+1)], 
#'                                 Y  = TrainData[,1])
#' ##generate new data
#' NewData <- GenSamples(s.size = 5e2, xdim = xdim, 
#'                       frho.para = frho.para, 
#'                       frho = frho, noise.para = 0.1)
#' sieve.model <- sieve.sgd.predict(sieve.model, X = NewData[, 2:(xdim+1)])
#' plot(NewData[, 2:(xdim+1)], sieve.model$best_model$prdy)
#' @export
#' 
sieve.sgd.predict <- function(sieve.model, X){
  
  #####normalize the X. Make sure the format of everything is correct.
  #s.size is sample size
  if(is(X,'numeric') | is(X, 'factor')){
    s.size <- length(X) #this is a univariate regression problem, sometimes the input data.frame is automatically conveted to an array
  }else{
    s.size <- dim(X)[1]
  }
  
  X <- sapply(X,unclass) #convert categorical variables into numeric variables
  X <- matrix(X, nrow = s.size)
  
  norm_para <- sieve.model$norm.para
  if(is.null(norm_para)){
    warning('there is no normalization parameter for X, 
            please use sieve.sgd.preprocess to preprocess the data.')  
  }
  
  norm_list <- normalize_X(X, norm_para = norm_para)
  X <- norm_list$X
  
  #####feed in the data one by one
  J <- sieve.model$hyper.para.list$J
  s <- sieve.model$hyper.para.list$s
  # r0 <- sieve.model$hyper.para.list$r0
  # omega <- sieve.model$hyper.para.list$omega
  index_matrix <- sieve.model$index.matrix
  # index.row.prod <- sieve.model$index.row.prod
  type <- sieve.model$type
  M <- length(sieve.model$inf.list)
  
  for(m in 1:M){
    #determine how many basis functions are needed
    J.m <- length(sieve.model$inf.list[[m]]$beta.f)
    
    Phi <- Design_M_C(X, J.m, type, index_matrix)
    
    tmp.prdy <- crossprod_C(Phi, matrix(sieve.model$inf.list[[m]]$beta.f))
    
    sieve.model$inf.list[[m]]$prdy <- tmp.prdy
  }
  
  sieve.model <- clean_up_result(sieve.model)
  print('find the best model prediction at')
  print('sieve.model$best_model$prdy')
  return(sieve.model)
  
}

#' Clean up the fitted model
#' 
#' @param sieve.model a sieve sgd model.
#' @return a processed sieve.model, adding function names and extract the best model
#' @export
clean_up_result <- function(sieve.model){
  index.matrix <- sieve.model$index.matrix
  index.row.prod <- sieve.model$index.row.prod
  basis_type <- sieve.model$type
  name_vector <- rep(NA, nrow(index.matrix))
  
  ##determine which model is the best according to rolling validation
  collect_cv_stat <- function(sieve.model){
    num_of_hyperpara <- length(sieve.model$inf.list)
    all_cv_stat <- NULL
    for(curr_hyperpara in 1:num_of_hyperpara){
      all_cv_stat <- c(all_cv_stat, sieve.model$inf.list[[curr_hyperpara]]$rolling.cv)
    }
    print('the best model index is')
    print(which.min(all_cv_stat))
    print('find it at sieve.model$best_model')
    return(which.min(all_cv_stat))
  }
  
  best_model_index <- collect_cv_stat(sieve.model)
  sieve.model$best_model <- sieve.model$inf.list[[best_model_index]]
  
  
  ####give the basis functions name
  spell_out_one_cosine_basis <- function(instruction_vector){
    basis_function_string <- NULL
    for(covariate_dimension in 1:length(instruction_vector)){
      
      if(instruction_vector[covariate_dimension] > 1){
        basis_function_string <- paste0(basis_function_string, '*cos(pi*',instruction_vector[covariate_dimension]- 1, '*x[', covariate_dimension,'])')
      }
    }
    if(is.null(basis_function_string)){
      basis_function_string <- '1'
    }else{
      basis_function_string <- gsub("^\\*(.+)", "\\1", basis_function_string)
    }
    
    return(basis_function_string)
  }
  
  if(basis_type == 'cosine'){
    for(basis_index in 1:length(index.row.prod)){
      instruction_vector <- index.matrix[basis_index,]
      name_vector[basis_index] <- spell_out_one_cosine_basis(instruction_vector)
    }
  }
  
  names(sieve.model$best_model$beta.f) <- name_vector[1:length(sieve.model$best_model$beta.f)]
  return(sieve.model)
}

######things i need to do
####prevent float number overflow
####able to process the data several times 
####other kinds of loss functions
####a better summary report. better documentation of the code.
####dynamically generating extra index matrix
Any scripts or data that you put into this service are public.
Sieve documentation built on Oct. 19, 2023, 5:12 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
Sieve
Nonparametric Estimation by the Method of Sieves

R/SieveSGDFunction.R
In Sieve: Nonparametric Estimation by the Method of Sieves

Defines functions clean_up_result sieve.sgd.predict sieve.sgd.solver sieve.sgd.preprocess

Documented in clean_up_result sieve.sgd.predict sieve.sgd.preprocess sieve.sgd.solver

Try the Sieve package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Sieve Nonparametric Estimation by the Method of Sieves

R/SieveSGDFunction.R In Sieve: Nonparametric Estimation by the Method of Sieves

Defines functions clean_up_result sieve.sgd.predict sieve.sgd.solver sieve.sgd.preprocess

Documented in clean_up_result sieve.sgd.predict sieve.sgd.preprocess sieve.sgd.solver

Try the Sieve package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Sieve
Nonparametric Estimation by the Method of Sieves

R/SieveSGDFunction.R
In Sieve: Nonparametric Estimation by the Method of Sieves