In jbkunst/klassets: Tools to simulate data set to teach Statistical Models and ML Algorithms

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  gganimate = list(nframes = 50)
)

library(ggplot2)

theme_set(theme_minimal(base_size = 21))

if(require(showtext)){

  sysfonts::font_add_google("IBM Plex Sans", "plex")
  showtext::showtext_auto()

}

Generating data set

The main function is sim_response_xy, you need to define:

A number of observations to simulate.
Distributions to sample $x$ and $y$. For example purrr::partial(runif, min = -1, max = 1).
A function to define the relation between the response and the $x$ and $y$, for example function(x, y) x > y This function must return a logical value.
A number to define the noise in the generated data.

library(klassets)
library(ggplot2)
library(patchwork)

set.seed(123)

df_default <- sim_response_xy(n = 500)

df_default

plot(df_default)

df <- sim_response_xy(
  n = 500, 
  x_dist = purrr::partial(runif, min = -1, max = 1),
  # relationship = function(x, y) sqrt(abs(x)) - x - 0.5 > sin(y),
  relationship = function(x, y) sin(x*pi) > sin(y*pi),
  noise = 0.15
  )

plot(df)

Fit classification algorithms

Logistic Regression

df_lr <- fit_logistic_regression(df)

df_lr

plot(df_lr)

By default the model use order = 1 of the variables, i.e, response ~ x + y. We can get a better fit if we increase the order.

df_lr2 <- fit_logistic_regression(df, order = 4, stepwise = TRUE)

attr(df_lr2, "model")

plot(df_lr2)

Testing various orders.

orders <- c(1, 2, 3, 4)

orders |> 
  purrr::map(fit_logistic_regression, df = df) |> 
  purrr::map(plot) |> 
  purrr::reduce(`+`) +
  patchwork::plot_layout(guides = "collect") &
  theme_void() + theme(legend.position = "none")

Classification Tree

(partykit::ctree)

df_rt <- fit_classification_tree(df)

plot(df_rt)

plot(fit_classification_tree(df, alpha = 0.25))

The region are filled with the probability of the respective node. We can specify the type of the prediction using the type argument. In the case of response.

df_rt_response <- fit_classification_tree(df, type = "response")

df_rt_response

plot(df_rt_response)

And now for the node.

df_rt_node <- fit_classification_tree(df, type = "node", maxdepth = 3)

df_rt_node

plot(df_rt_node)

plot(attr(df_rt_node, "model"))

K Nearest Neighbours

With the class::knn implementation.

# defaults to prob
fit_knn(df)

fit_knn(df, type = "response")

plot(fit_knn(df))

neighbours <- c(3, 10, 50, 300)

purrr::map(neighbours, fit_knn, df = df) |> 
  purrr::map(plot) |> 
  purrr::reduce(`+`) +
  patchwork::plot_layout(guides = "collect") &
  theme_void() + theme(legend.position = "none")

purrr::map(neighbours, fit_knn, df = df, type = "response") |> 
  purrr::map(plot) |> 
  purrr::reduce(`+`) +
  patchwork::plot_layout(guides = "collect")  &
  theme_void() + theme(legend.position = "none")

Random Forest

Using the ranger::ranger function.

df_crf <- fit_classification_random_forest(df)

plot(df_crf)

df_crf

jbkunst/klassets documentation built on Dec. 7, 2022, 9:18 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jbkunst/klassets
Tools to simulate data set to teach Statistical Models and ML Algorithms

In jbkunst/klassets: Tools to simulate data set to teach Statistical Models and ML Algorithms

Generating data set

Fit classification algorithms

Logistic Regression

Classification Tree

K Nearest Neighbours

Random Forest

R Package Documentation

Browse R Packages

We want your feedback!

jbkunst/klassets Tools to simulate data set to teach Statistical Models and ML Algorithms

In jbkunst/klassets: Tools to simulate data set to teach Statistical Models and ML Algorithms

Generating data set

Fit classification algorithms

Logistic Regression

Classification Tree

K Nearest Neighbours

Random Forest

R Package Documentation

Browse R Packages

We want your feedback!

jbkunst/klassets
Tools to simulate data set to teach Statistical Models and ML Algorithms