knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
First of all, thank you for using healthyR.ai. If you encounter issues or want
to make a feature request, please visit https://github.com/spsanderson/healthyR.ai/issues
library(healthyR.ai)
In this should example we will showcase the pca_your_recipe() function. This
function takes only a few arguments. The arguments are currently .data which
is the full data set that gets passed internally to the recipes::bake() function,
.recipe_object which is a recipe you have already made and want to pass to the
function in order to perform the pca, and finally .threshold which is the fraction
of the variance that should be captured by the components.
To start this walk through we will first load in a few libraries.
library(timetk) library(dplyr) library(purrr) library(healthyR.data) library(rsample) library(recipes) library(ggplot2) library(plotly)
Now that we have out libraries we can go ahead and get our data set ready.
data_tbl <- healthyR_data %>% select(visit_end_date_time) %>% summarise_by_time( .date_var = visit_end_date_time, .by = "month", value = n() ) %>% set_names("date_col","value") %>% filter_by_time( .date_var = date_col, .start_date = "2013", .end_date = "2020" ) %>% mutate(date_col = as.Date(date_col)) head(data_tbl)
The data set is simple and by itself would not be at all useful for a pca analysis
since there is only one predictor, being time. In order to facilitate the use of
the function and this example, we will create a splits object and a recipe
object.
splits <- initial_split(data = data_tbl, prop = 0.8) splits head(training(splits))
rec_obj <- recipe(value ~ ., training(splits)) %>% step_timeseries_signature(date_col) %>% step_rm(matches("(iso$)|(xts$)|(hour)|(min)|(sec)|(am.pm)")) rec_obj get_juiced_data(rec_obj) %>% glimpse()
Now that we have out initial recipe we can use the pca_your_recipe() function.
pca_list <- pca_your_recipe( .recipe_object = rec_obj, .data = data_tbl, .threshold = 0.8, .top_n = 5 )
The function returns a list object and does so insvisible so you must assign
the output to a variable, you can then access the items of the list in the usual
manner.
The following items are included in the output of the function:
Lets start going down the list of items.
This is the portion you will want to output to a variable as this is the recipe object itself that you will use further down the line of your work.
pca_rec_obj <- pca_list$pca_transform pca_rec_obj
pca_list$variable_loadings
pca_list$variable_variance
pca_list$pca_estimates
pca_list$pca_juiced_estimates %>% glimpse() pca_list$pca_baked_data %>% glimpse()
pca_list$pca_rotation_df %>% glimpse()
pca_list$pca_variance_df %>% glimpse()
pca_list$pca_variance_scree_plt
pca_list$pca_loadings_plt pca_list$pca_top_n_loadings_plt
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.