varslice_resample: Generate data set for estimated outcome variable values given...

View source: R/varslice_resample.R

varslice_resampleR Documentation

Generate data set for estimated outcome variable values given the influencing variable, based on a slice of 'z' from the kernel density plot of the variable and out_var data.

Description

Resampling function for slices through a kernel density surface. First, a kernel density surface is produced based on in_var / out_var pairs. Then the function extracts values from this surface for a specified value of in_var (expectedin_var), extracting n_slice_points values along the way. Based on these points,

Usage

varslice_resample(
  in_var,
  out_var,
  expectedin_var,
  n = 100,
  n_samples = 1000,
  out_var_sampling = 1000
)

Arguments

in_var

is a vector of observations of a given influencing variable corresponding to another list with observed values of an outcome variable out_var.

out_var

is a vector of observed values of an outcome variable corresponding to another list with observations of a given influencing variable in_var.

expectedin_var

is the expected value of the input variable for which the outcome variable out_var should be estimated.

n

Number of grid points in each direction. Can be scalar or a length-2 integer vector (passed to the kde2d kernel density function of the MASS package).

n_samples

is the number of samples to draw in the resampling procedure

out_var_sampling

sampling scheme for extracting values from the kernel density surface. This is used to create a vector of out_var values, for which the probabilities are extracted. NOTE that only these values can later be returned in the resampling process. This can either be a single number, which is then used to create as many evenly spaced points (defaults to 1000). It is also possible to provide a numeric vector of values within the out_var range, in which case only probabilities for the specified numbers are extracted (and only these values can be returned by the resampling).

Value

list of two elements: ‘slice' is a data.frame with columns Output_values and Relative_probability, which represents the ’slice' of the data that the resampling was based on; 'resampled' is a vector of the values returned by the resampling (containing only numbers represented in the Output_values column of 'slice'.

Examples

in_var <- sample(x = 1:200, size = 25, replace = TRUE)
out_var <- sample(x = 1000:7000, size = 25, replace = TRUE)
resampled<-varslice_resample(in_var, out_var, expectedin_var = 150)
plot(resampled$slice$Output_values,
resampled$slice$Relative_probability)
hist(resampled$resampled)

# with a coarser resolution (100 out_var units between points)
resampled_coarse <- varslice_resample(in_var, out_var, 
expectedin_var = 40,out_var_sampling=100)
plot(resampled_coarse$slice$Output_values,
resampled_coarse$slice$Relative_probability)
hist(resampled_coarse$resampled) 

# for isolated values only
resampled_iso <- varslice_resample(in_var, out_var, expectedin_var = 40, out_var_sampling = c(2000,3000,4000,5000))
plot(resampled_iso$slice$Output_values, resampled_iso$slice$Relative_probability)
hist(resampled_iso$resampled)  



CWWhitney/uncertainty documentation built on June 14, 2022, 10:21 p.m.