prepare_set | R Documentation |
Full pipeline for preparing your data_set set.
prepare_set(data_set, final_form = "data.table", verbose = TRUE, ...)
data_set |
Matrix, data.frame or data.table |
final_form |
"data.table" or "numerical_matrix" (default to data.table) |
verbose |
Should the algorithm talk? (logical, default to TRUE) |
... |
Additional parameters to tune pipeline (see details) |
Additional arguments are available to tune pipeline:
key
Name of a column of data_set according to which data_set should be aggregated
(character)
analysis_date
A date at which the data_set should be aggregated
(differences between every date and analysis_date will be computed) (Date)
n_unfactor
Number of max value in a factor, set it to -1 to disable
un_factor
function. (numeric, default to 53)
digits
The number of digits after comma (optional, numeric, if set will perform
fast_round
)
dateFormats
List of format of Dates in data_set (list of characters)
name_separator
character to separate parts of new column names (character, default to ".")
functions
Aggregation functions for numeric columns, see aggregate_by_key
(list of functions names (character))
factor_date_type
Aggregation level to factorize date (see
generate_factor_from_date
) (character, default to "yearmonth")
target_col
A target column to perform target encoding, see target_encode
(character)
target_encoding_functions
Functions to perform target encoding, see
build_target_encoding
,
if target_col
is not given will not do anything, (list, default to "mean"
)
A data.table or a numerical matrix (according to final_form
).
It will perform the following steps:
Correct set: unfactor factor with many values, id dates and numeric that are hiden in character
Transform set: compute differences between every date, transform dates into factors, generate
features from character..., if key
is provided, will perform aggregate according to this key
Filter set: filter constant, in double or bijection variables. If 'digits' is provided, will round numeric
Handle NA: will perform fast_handle_na
)
Shape set: will put the result in asked shape (final_form
) with acceptable columns format.
# Load ugly set
## Not run:
data(tiny_messy_adult)
# Have a look to set
head(tiny_messy_adult)
# Compute full pipeline
clean_adult <- prepare_set(tiny_messy_adult)
# With a reference date
adult_agg <- prepare_set(tiny_messy_adult, analysis_date = as.Date("2017-01-01"))
# Add aggregation by country
adult_agg <- prepare_set(tiny_messy_adult, analysis_date = as.Date("2017-01-01"), key = "country")
# With some new aggregation functions
power <- function(x) {sum(x^2)}
adult_agg <- prepare_set(tiny_messy_adult, analysis_date = as.Date("2017-01-01"), key = "country",
functions = c("min", "max", "mean", "power"))
## End(Not run)
# "##NOT RUN:" mean that this example hasn't been run on CRAN since its long. But you can run it!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.