#| child: aaa.Rmd #| include: false
r descr_models("rand_forest", "spark")
#| label: spark-param-info #| echo: false defaults <- tibble::tibble(parsnip = c("mtry", "trees", "min_n"), default = c("see below", "20L", "1L")) param <- rand_forest() |> set_engine("spark") |> make_parameter_list(defaults)
This model has r nrow(param) tuning parameters:
#| label: spark-param-list #| echo: false #| results: asis param$item
mtry depends on the number of columns and the model mode. The default in [sparklyr::ml_random_forest()] is floor(sqrt(ncol(x))) for classification and floor(ncol(x)/3) for regression.
#| label: spark-reg rand_forest( mtry = integer(1), trees = integer(1), min_n = integer(1) ) |> set_engine("spark") |> set_mode("regression") |> translate()
min_rows() and min_cols() will adjust the number of neighbors if the chosen value if it is not consistent with the actual data dimensions.
#| label: spark-cls rand_forest( mtry = integer(1), trees = integer(1), min_n = integer(1) ) |> set_engine("spark") |> set_mode("classification") |> translate()
#| child: template-tree-split-factors.Rmd
#| child: template-spark-notes.Rmd
#| child: template-uses-case-weights.Rmd
Note that, for spark engines, the case_weight argument value should be a character string to specify the column with the numeric case weights.
#| label: predict-types parsnip:::get_from_env("rand_forest_predict") |> dplyr::filter(engine == "spark") |> dplyr::select(mode, type)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.