Description Usage Arguments Details Value Neighbor's brew Soft brew Note References Examples
brew()
is the first function in the ipa
workflow .
1 2 3 4 5 |
data |
a data frame with missing values. |
outcome |
column name(s) of outcomes. These values can be provided as symbols (e.g., outcome = c(a,b,c) for multiple outcomes or outcome = a for one outcome) or character values (e.g., outcome = c('a','b','c') for multiple outcomes or outcome = 'a' for a single outcome). |
flavor |
the computational approach that will be used to impute missing data. Valid options are 'kneighbors' and 'softImpute'. These values should be input as characters (e.g., 'kneighbors'). |
bind_miss |
( |
Brewing a great beer is not that different from imputing
missing data. Once a brew is started, you can add spices
(set primary parameters; see spice) and then mash the mixture
(fitting imputation models; see mash). To finish the brew, add
yeast (new data; see ferment), and then bottle it up (see
bottle) as a tibble
or matrix
.
brew()
includes an input variable called flavor
that determines
how data will be imputed. brew_nbrs()
and brew_soft()
are
convenience functions, e.g. brew_nbrs()
is a
shortcut for calling brew(flavor = 'kneighbors')
.
an ipa_brew
object with your specified flavor
an adaptation of Max Kuhn's
nearest neighbor imputation functions in the recipes
and caret
packages. It also uses the gower
package to implement algorithms that compute Gower's distance.
What makes this type of nearest neighbor imputation
different is its flexibility in the number of neighbors used
to impute missing values and the aggregation function applied.
For example, to create 10 imputed datasets that use 1, 2, ..., 10
neighbors to impute missing values would require fitting
10 separate nearest neighbors models using conventional functions.
The ipa
package lets a user create all of these imputed sets
with just one fitting of a nearest neighbor model. Additionally,
for users who want to use nearest neighbors for multiple imputation,
ipa
gives the option to sample 1 neighbor value at random from
a neighborhood, rather than aggregate values into a summary.
The softImpute
algorithm is used to impute
missing values with this brew
. For more details on this strategy
to handle missing values, please see
softImpute.
Gower (1971) originally defined a similarity measure (s, say) with values ranging from 0 (completely dissimilar) to 1 (completely similar). The distance returned here equals 1-s.
Gower, John C. "A general coefficient of similarity and some of its properties." Biometrics (1971): 857-871.
Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices, http://www.stanford.edu/~hastie/Papers/mazumder10a.pdf Journal of Machine Learning Research 11 (2010) 2287-2322
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | data <- data.frame(
x1 = 1:10,
x2 = 10:1,
x3 = 1:10,
outcome = 11 + runif(10)
)
data[1:2, 1:2] = NA
knn_brew <- brew(data, outcome = outcome, flavor = 'kneighbors')
sft_brew <- brew(data, outcome = outcome, flavor = 'softImpute')
knn_brew <- brew_nbrs(data, outcome = outcome)
sft_brew <- brew_soft(data, outcome = outcome)
print(knn_brew)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.