I'm not quite sure about the best way to sample from the distribution for an input of interest, conditional on the other inputs, when those other inputs include categorical variables.
Gelman & Pardoe 2007 discuss what to do when the input of interest is an unordered categorical variable, but I'm mainly interested in applying the statistic I'm calling impact to categorical variables, since impact is comparable across all types of inputs.
Rigth now we're just providing point summaries. It'd be nice to see the distribution. (This is different from accounting for model uncertainty.)
As discussed in the paper, it is entirely natural to allow for multiple samples of parameters and plot these measures with their uncertainties as in the paper.
I'm also interested in the uncertainty that arises from the density estimation.
More work is necessary to understand better what the weights should be / how to best represent the appropriate conditional distributions. Maybe in some cases we should even drop the distance-based approach and use something like BART?
I should deal with potential column name conflicts in data frames (e.g. I use a "Weight" column, so I'll have problems if one of the inputs has that name).
I'd like to compile a list of other work in this direction, maybe comparing them with this (and perhaps giving examples that show why/when you'd want to use this instead).
I should add a page discussing other methods people have used to get at somewhat the same idea.
randomForest
package in R (partial plots, variable importance)earth
package in R (variable importance)caret
to see what's available thereAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.