masher_nbrs: Neighbor's masher

Description Usage Arguments Value Examples

View source: R/mash.R

Description

If you use Rstudio, the masher and spicer functions can help remind you which parameters go along with which ipa_brew flavor. The basic idea is to write spice(brew, with = spicer_<flavor>()) and mash(brew, with = masher_<flavor>()). Hitting the tab key with your curser inside the parentheses of masher_flavor()will create a drop-down menu that shows a list of the arguments that go along with your brew's flavor.

If you have no trouble remembering the parameters that go along with your brew's flavor, or if you just want your code to be more concise, you don't have to use the with argument. Instead, you can just specify parameter values directly using the ... argument in the mash and spice functions. In the examples below, both approaches are shown.

Usage

1
2
3
4
5
6
7
masher_nbrs(
  epsilon = 1e-08,
  nthread = NULL,
  fun_aggr_ctns = mean,
  fun_aggr_intg = medn_est,
  fun_aggr_catg = mode_est
)

Arguments

epsilon

Computed numbers (variable ranges) smaller than eps are treated as zero

nthread

Number of threads to use for parallelization. By default, for a dual-core machine, 2 threads are used. For any other machine n-1 cores are used so your machine doesn't freeze during a big computation. The maximum nr of threads are determined using omp_get_max_threads at C level.

fun_aggr_ctns

a function used to aggregate neighbors for continuous variables. If unspecified, the mean() function is used.

fun_aggr_intg

a function used to aggregate neighbors for integer values variables. If unspecified, the medn_est() function is used. This function returns the median of neighbor values, rounded to the nearest integer. medn_est_conserve() goes one step further and identifies which neighbor value is closest to the median, and returns that value. Both of these options can be helpful for integer valued columns if you want to make sure the imputed values do not contain impossible quantities, e.g. no. of children = 3/4.

fun_aggr_catg

a function used to aggregate neighbors for categorical variables. If unspecified, the mode_est() function is used.

Value

a list with input values that can be passed directly into mash, e.g mash(brew, with = masher_nbrs()) for a neighbors brew or mash(brew, with = masher_soft()) for a soft brew.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
x1 = rnorm(100)
x2 = rnorm(100) + x1
x3 = rnorm(100) + x1 + x2

outcome = 0.5 * (x1 - x2 + x3)

data <- data.frame(x1=x1, x2=x2, x3=x3, outcome=outcome)

n_miss = 10

data[1:n_miss,'x1'] = NA
sft_brew <- brew_soft(data, outcome=outcome, bind_miss = FALSE)

# these two calls are equivalent
mash(sft_brew, with = masher_soft(bs = FALSE))
mash(sft_brew, bs = FALSE)

knn_brew <- brew_nbrs(data, outcome=outcome, bind_miss = TRUE) %>%

# these two calls are equivalent
mash(knn_brew, with = masher_nbrs(fun_aggr_ctns = median))
mash(knn_brew, fun_aggr_ctns = median)

bcjaeger/midy documentation built on May 3, 2020, 3:55 p.m.