r descr_models("rand_forest", "spark")

Tuning Parameters

defaults <- 
  tibble::tibble(parsnip = c("mtry", "trees", "min_n"),
                 default = c("see below", "20L", "1L"))

param <-
  rand_forest() %>% 
  set_engine("spark") %>% 
  make_parameter_list(defaults)

This model has r nrow(param) tuning parameters:

param$item

mtry depends on the number of columns and the model mode. The default in [sparklyr::ml_random_forest()] is floor(sqrt(ncol(x))) for classification and floor(ncol(x)/3) for regression.

Translation from parsnip to the original package (regression)

rand_forest(
  mtry = integer(1),
  trees = integer(1),
  min_n = integer(1)
) %>%  
  set_engine("spark") %>% 
  set_mode("regression") %>% 
  translate()

min_rows() and min_cols() will adjust the number of neighbors if the chosen value if it is not consistent with the actual data dimensions.

Translation from parsnip to the original package (classification)

rand_forest(
  mtry = integer(1),
  trees = integer(1),
  min_n = integer(1)
) %>% 
  set_engine("spark") %>% 
  set_mode("classification") %>% 
  translate()

Preprocessing requirements


Other details


Case weights


Note that, for spark engines, the case_weight argument value should be a character string to specify the column with the numeric case weights.

References



topepo/parsnip documentation built on April 16, 2024, 3:23 a.m.