#| child: aaa.Rmd
#| include: false

r descr_models("rand_forest", "spark")

Tuning Parameters

#| label: spark-param-info
#| echo: false
defaults <- 
  tibble::tibble(parsnip = c("mtry", "trees", "min_n"),
                 default = c("see below", "20L", "1L"))

param <-
  rand_forest() |> 
  set_engine("spark") |> 
  make_parameter_list(defaults)

This model has r nrow(param) tuning parameters:

#| label: spark-param-list
#| echo: false
#| results: asis
param$item

mtry depends on the number of columns and the model mode. The default in [sparklyr::ml_random_forest()] is floor(sqrt(ncol(x))) for classification and floor(ncol(x)/3) for regression.

Translation from parsnip to the original package (regression)

#| label: spark-reg
rand_forest(
  mtry = integer(1),
  trees = integer(1),
  min_n = integer(1)
) |>  
  set_engine("spark") |> 
  set_mode("regression") |> 
  translate()

min_rows() and min_cols() will adjust the number of neighbors if the chosen value if it is not consistent with the actual data dimensions.

Translation from parsnip to the original package (classification)

#| label: spark-cls
rand_forest(
  mtry = integer(1),
  trees = integer(1),
  min_n = integer(1)
) |> 
  set_engine("spark") |> 
  set_mode("classification") |> 
  translate()

Preprocessing requirements

#| child: template-tree-split-factors.Rmd

Other details

#| child: template-spark-notes.Rmd

Case weights

#| child: template-uses-case-weights.Rmd

Note that, for spark engines, the case_weight argument value should be a character string to specify the column with the numeric case weights.

Prediction types

#| label: predict-types

parsnip:::get_from_env("rand_forest_predict") |>
  dplyr::filter(engine == "spark") |>
  dplyr::select(mode, type)

References



Try the parsnip package in your browser

Any scripts or data that you put into this service are public.

parsnip documentation built on Jan. 11, 2026, 9:06 a.m.