knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "inst/README-" ) set.seed(5)
jdify is an R package implementing classifiers based on the joint density of the predictors and the class variable. Several methods for joint density estimation can be used.
To install, open R and type
devtools::install_github("tnagler/jdify")
The core functionality is illustrated below and in this code snippet. For a detailed description of all functions and their arguments, see the API documentation.
The core function in this package is jdify()
which builds a classification
model for a given data set. It estimates the joint density of the predictors
and the class variable and derives conditional class probabilities from it.
library(jdify)
dat <- data.frame( cl = as.factor(rbinom(10, 1, 0.5)), x1 = rnorm(10), x2 = ordered(rbinom(10, 5, 0.3)) ) model <- jdify(cl ~ x1 + x2, data = dat, jd_method = "cctools") probs <- predict(model, dat, what = "probs") # conditional probabilities
jdify()
can handle discrete predictors. They have to be declared as ordered
or
factor
(for unordered categorical variables). All other variables are treated
as continuous.
You can choose from three built-in methods for: "cctools"
(default),
"kdevine"
, "np"
. The method name indicates the package that is used for
joint density estimation.
You can also create custom functions for density estimation by jd_method()
.
The following is another implementation of the method "kdevine"
.
my_fit <- function(x, ...) kdevine::kdevine(x, ...) my_eval <- function(object, newdata, ...) kdevine::dkdevine(newdata, object) my_method <- jd_method(fit_fun = my_fit, eval_fun = my_eval, cc = TRUE) model <- jdify(cl ~ x1 + x2, data = dat, jd_method = my_method)
The option cc = TRUE
indicates that the method does not naturally handle
discrete data. In this case, jdify
automatically invokes the continuous
convolution trick (see, Nagler, 2017).
cv_jdify()
is a convenience function that does k-fold cross validation for
you. It splits the data, fits joint density models and evaluates the conditional
class probabilities on the hold-out samples.
cv <- cv_jdify(cl ~ x1 + x2, data = dat, folds = 3) cv$cv_probs
The function assess_clsfyr()
allows to calculate several performance measures
from the conditional class probabilities. Its first argument is the probability
of the class, the second is a class indicator.
assess_clsfyr(cv$cv_probs[, 1], dat[, 1] == 0, measure = c("ACC", "F1"))
Nagler, T. (2017). A generic approach to nonparametric function estimation with mixed data. arXiv:1704.07457
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.