In topepo/parsnip: A Common API to Modeling and Analysis Functions

r descr_models("rand_forest", "spark")

Tuning Parameters

defaults <- 
  tibble::tibble(parsnip = c("mtry", "trees", "min_n"),
                 default = c("see below", "20L", "1L"))

param <-
  rand_forest() %>% 
  set_engine("spark") %>% 
  make_parameter_list(defaults)

This model has r nrow(param) tuning parameters:

param$item

mtry depends on the number of columns and the model mode. The default in [sparklyr::ml_random_forest()] is floor(sqrt(ncol(x))) for classification and floor(ncol(x)/3) for regression.

Translation from parsnip to the original package (regression)

rand_forest(
  mtry = integer(1),
  trees = integer(1),
  min_n = integer(1)
) %>%  
  set_engine("spark") %>% 
  set_mode("regression") %>% 
  translate()

min_rows() and min_cols() will adjust the number of neighbors if the chosen value if it is not consistent with the actual data dimensions.

Translation from parsnip to the original package (classification)

rand_forest(
  mtry = integer(1),
  trees = integer(1),
  min_n = integer(1)
) %>% 
  set_engine("spark") %>% 
  set_mode("classification") %>% 
  translate()