reweightData: Function to Reweight Data

View source: R/utilityFunctions.R

reweightDataR Documentation

Function to Reweight Data

Description

Function to Reweight Data

Usage

reweightData(
  data,
  argvals,
  vars,
  longvars = NULL,
  weights,
  index,
  idvars = NULL,
  compress = FALSE
)

Arguments

data

a named list or data.frame.

argvals

character (vector); name(s) for entries in data giving the index for observed grid points; must be supplied if vars is not supplied.

vars

character (vector); name(s) for entries in data, which are subsetted according to weights or index. Must be supplied if argvals is not supplied.

longvars

variables in long format, e.g., a response that is observed at curve specific grids.

weights

vector of weights for observations. Must be supplied if index is not supplied.

index

vector of indices for observations. Must be supplied if weights is not supplied.

idvars

character (vector); index, which is needed to expand vars to be conform with the hmatrix structure when using bhistx-base-learners or to be conform with variables in long format specified in longvars.

compress

logical; whether hmatrix objects are saved in compressed form or not. Default is TRUE. Should be set to FALSE when using reweightData for nested resampling.

Details

reweightData indexes the rows of matrices and / or positions of vectors by using either the index or the weights-argument. To prevent the function from indexing the list entry / entries, which serve as time index for observed grid points of each trajectory of functional observations, the argvals argument (vector of character names for these list entries) can be supplied. If argvals is not supplied, vars must be supplied and it is assumed that argvals is equal to names(data)[!names(data) %in% vars].

When using weights, a weight vector of length N must be supplied, where N is the number of observations. When using index, the vector must contain the index of each row as many times as it shall be included in the new data set.

Value

A list with the reweighted or subsetted data.

Author(s)

David Ruegamer, Sarah Brockhaus

Examples

## load data
data("viscosity", package = "FDboost")
interval <- "101"
end <- which(viscosity$timeAll == as.numeric(interval))
viscosity$vis <- log(viscosity$visAll[ , 1:end])
viscosity$time <- viscosity$timeAll[1:end]

## what does data look like
str(viscosity)

## do some reweighting
# correct weights
str(reweightData(viscosity, vars=c("vis", "T_C", "T_A", "rspeed", "mflow"), 
    argvals = "time", weights = c(0, 32, 32, rep(0, 61))))

str(visNew <- reweightData(viscosity, vars=c("vis", "T_C", "T_A", "rspeed", "mflow"), 
    argvals = "time", weights = c(0, 32, 32, rep(0, 61))))
# check the result
# visNew$vis[1:5, 1:5] ## image(visNew$vis)

# incorrect weights
str(reweightData(viscosity, vars=c("vis", "T_C", "T_A", "rspeed", "mflow"), 
    argvals = "time", weights = sample(1:64, replace = TRUE)), 1)

# supply meaningful index
str(visNew <- reweightData(viscosity, vars = c("vis", "T_C", "T_A", "rspeed", "mflow"), 
              argvals = "time", index = rep(1:32, each = 2)))
# check the result
# visNew$vis[1:5, 1:5]

# errors
if(FALSE){
   reweightData(viscosity, argvals = "")
   reweightData(viscosity, argvals = "covThatDoesntExist", index = rep(1,64))
   }
   

FDboost documentation built on Aug. 12, 2023, 5:13 p.m.