Description Usage Arguments Value Note References Examples
Perform probability estimation using jittering with over or undersampling.
1 2 3 |
X |
A matrix of continuous predictors. |
y |
A vector of responses with entries in |
class_func |
Function to perform classification. This function definition must be
exactly of the form |
pred_func |
Function to create predictions. This function definition must be
exactly of the form |
type |
Type of sampling: "over" for oversampling, or "under" for undersampling. |
delta |
An integer (greater than 3) to control the number of quantiles to estimate: |
nu |
The amount of noise to apply to predictors when oversampling data.
The noise level is controlled by |
X_pred |
A matrix of predictors for which to form probability estimates. |
keep_models |
Whether to store all of the models used to create
the probability estimates. If |
verbose |
If |
parallel |
If |
packages |
If |
Returns a list containing information about the
parameters used in the jous
function call, as well as the following
additional components:
q |
The vector of target quantiles estimated by |
phat_train |
The in-sample probability estimates p(y=1|x). |
phat_test |
Probability estimates for the optional test data in |
models |
If |
confusion_matrix |
A confusion matrix for the in-sample fits. |
The jous
function runs the classifier class_func
a total
of delta
times on the data, which can be computationally expensive.
Also,jous
cannot yet be applied to categorical predictors - in the
oversampling case, it is not clear how to "jitter" a categorical variable.
Mease, D., Wyner, A. and Buja, A. (2007). Costweighted boosting with jittering and over/under-sampling: JOUS-boost. J. Machine Learning Research 8 409-439.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | ## Not run:
# Generate data from Friedman model #
set.seed(111)
dat = friedman_data(n = 500, gamma = 0.5)
train_index = sample(1:500, 400)
# Apply jous to adaboost classifier
class_func = function(X, y) adaboost(X, y, tree_depth = 2, n_rounds = 200)
pred_func = function(fit_obj, X_test) predict(fit_obj, X_test)
jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func,
pred_func, keep_models = TRUE)
# get probability
phat_jous = predict(jous_fit, dat$X[-train_index, ], type = "prob")
# compare with probability from AdaBoost
ada = adaboost(dat$X[train_index,], dat$y[train_index], tree_depth = 2,
n_rounds = 200)
phat_ada = predict(ada, dat$X[train_index,], type = "prob")
mean((phat_jous - dat$p[-train_index])^2)
mean((phat_ada - dat$p[-train_index])^2)
## Example using parallel option
library(doParallel)
cl <- makeCluster(4)
registerDoParallel(cl)
# n.b. the packages='rpart' is not really needed here since it gets
# exported automatically by JOUSBoost, but for illustration
jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func,
pred_func, keep_models = TRUE, parallel = TRUE,
packages = 'rpart')
phat = predict(jous_fit, dat$X[-train_index,], type = 'prob')
stopCluster(cl)
## Example using SVM
library(kernlab)
class_func = function(X, y) ksvm(X, as.factor(y), kernel = 'rbfdot')
pred_func = function(obj, X) as.numeric(as.character(predict(obj, X)))
jous_obj = jous(dat$X[train_index,], dat$y[train_index], class_func = class_func,
pred_func = pred_func, keep_models = TRUE)
jous_pred = predict(jous_obj, dat$X[-train_index,], type = 'prob')
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.