r descr_models("rand_forest", "spark")

Tuning Parameters

defaults <- 
  tibble::tibble(parsnip = c("mtry", "trees", "min_n"),
                 default = c("see below", "20L", "1L"))

param <-
  rand_forest() %>% 
  set_engine("spark") %>% 
  make_parameter_list(defaults)

This model has r nrow(param) tuning parameters:

param$item

mtry depends on the number of columns and the model mode. The default in [sparklyr::ml_random_forest()] is floor(sqrt(ncol(x))) for classification and floor(ncol(x)/3) for regression.

Translation from parsnip to the original package (regression)

rand_forest(
  mtry = integer(1),
  trees = integer(1),
  min_n = integer(1)
) %>%  
  set_engine("spark") %>% 
  set_mode("regression") %>% 
  translate()

min_rows() and min_cols() will adjust the number of neighbors if the chosen value if it is not consistent with the actual data dimensions.

Translation from parsnip to the original package (classification)

rand_forest(
  mtry = integer(1),
  trees = integer(1),
  min_n = integer(1)
) %>% 
  set_engine("spark") %>% 
  set_mode("classification") %>% 
  translate()

Preprocessing requirements


Other details


Case weights


Note that, for spark engines, the case_weight argument value should be a character string to specify the column with the numeric case weights.

References



Try the parsnip package in your browser

Any scripts or data that you put into this service are public.

parsnip documentation built on Aug. 18, 2023, 1:07 a.m.