Description Usage Arguments Details Value Neighbor's brew Soft brew Note References Examples
brew() is the first function in the ipa workflow .
1 2 3 4 5 |
data |
a data frame with missing values. |
outcome |
column name(s) of outcomes. These values can be provided as symbols (e.g., outcome = c(a,b,c) for multiple outcomes or outcome = a for one outcome) or character values (e.g., outcome = c('a','b','c') for multiple outcomes or outcome = 'a' for a single outcome). |
flavor |
the computational approach that will be used to impute missing data. Valid options are 'kneighbors' and 'softImpute'. These values should be input as characters (e.g., 'kneighbors'). |
bind_miss |
( |
Brewing a great beer is not that different from imputing
missing data. Once a brew is started, you can add spices
(set primary parameters; see spice) and then mash the mixture
(fitting imputation models; see mash). To finish the brew, add
yeast (new data; see ferment), and then bottle it up (see
bottle) as a tibble or matrix.
brew() includes an input variable called flavor that determines
how data will be imputed. brew_nbrs() and brew_soft() are
convenience functions, e.g. brew_nbrs() is a
shortcut for calling brew(flavor = 'kneighbors').
an ipa_brew object with your specified flavor
an adaptation of Max Kuhn's
nearest neighbor imputation functions in the recipes
and caret packages. It also uses the gower
package to implement algorithms that compute Gower's distance.
What makes this type of nearest neighbor imputation
different is its flexibility in the number of neighbors used
to impute missing values and the aggregation function applied.
For example, to create 10 imputed datasets that use 1, 2, ..., 10
neighbors to impute missing values would require fitting
10 separate nearest neighbors models using conventional functions.
The ipa package lets a user create all of these imputed sets
with just one fitting of a nearest neighbor model. Additionally,
for users who want to use nearest neighbors for multiple imputation,
ipa gives the option to sample 1 neighbor value at random from
a neighborhood, rather than aggregate values into a summary.
The softImpute algorithm is used to impute
missing values with this brew. For more details on this strategy
to handle missing values, please see
softImpute.
Gower (1971) originally defined a similarity measure (s, say) with values ranging from 0 (completely dissimilar) to 1 (completely similar). The distance returned here equals 1-s.
Gower, John C. "A general coefficient of similarity and some of its properties." Biometrics (1971): 857-871.
Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices, http://www.stanford.edu/~hastie/Papers/mazumder10a.pdf Journal of Machine Learning Research 11 (2010) 2287-2322
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | data <- data.frame(
x1 = 1:10,
x2 = 10:1,
x3 = 1:10,
outcome = 11 + runif(10)
)
data[1:2, 1:2] = NA
knn_brew <- brew(data, outcome = outcome, flavor = 'kneighbors')
sft_brew <- brew(data, outcome = outcome, flavor = 'softImpute')
knn_brew <- brew_nbrs(data, outcome = outcome)
sft_brew <- brew_soft(data, outcome = outcome)
print(knn_brew)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.