knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
sparkxgb is a sparklyr extension that provides an interface to XGBoost on Spark.
install.packages("sparkxgb")
You can install the development version of sparkxgb
with:
# install.packages("pak") pak::pak("rstudio/sparkxgb")
sparkxgb supports the familiar formula interface for specifying models:
library(sparkxgb) library(sparklyr) library(dplyr) sc <- spark_connect(master = "local") iris_tbl <- sdf_copy_to(sc, iris) xgb_model <- xgboost_classifier( iris_tbl, Species ~ ., num_class = 3, num_round = 50, max_depth = 4 ) xgb_model %>% ml_predict(iris_tbl) %>% select(Species, predicted_label, starts_with("probability_")) %>% glimpse()
It also provides a Pipelines API, which means you can use a xgboost_classifier
or xgboost_regressor
in a pipeline as any Estimator
, and do things like
hyperparameter tuning:
pipeline <- ml_pipeline(sc) %>% ft_r_formula(Species ~ .) %>% xgboost_classifier(num_class = 3) param_grid <- list( xgboost = list( max_depth = c(1, 5), num_round = c(10, 50) ) ) cv <- ml_cross_validator( sc, estimator = pipeline, evaluator = ml_multiclass_classification_evaluator( sc, label_col = "label", raw_prediction_col = "rawPrediction" ), estimator_param_maps = param_grid ) cv_model <- cv %>% ml_fit(iris_tbl) summary(cv_model) spark_disconnect(sc)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.