Data preparation accounts for about 80% of the work during a data science project. Let's take that number down. dataPreparation will allow you to do most of the painful data preparation for a data science project with a minimum amount of code.
This package is
- fast (use data.table
and exponential search)
- RAM efficient (perform operations by reference and column-wise to avoid copying data)
- stable (most exceptions are handled)
- verbose (log a lot)
Before using any machine learning (ML) algorithm, one need to prepare its data. Preparing a data set for a data science project can be long and tricky. The main steps are the followings:
data.table::fread
)Here are the functions available in this package to tackle those issues:
Correct | Transform | Filter | Pre model manipulation| Shape --------- |----------- |-------- |----------- |------------------------ un_factor | generate_date_diffs | fast_filter_variables | fast_handle_na | shape_set find_and_transform_dates | generate_factor_from_date | which_are_constant | fast_discretization | same_shape find_and_transform_numerics | aggregate_by_key | which_are_in_double | fast_scale | set_as_numeric_matrix set_col_as_character | generate_from_factor | which_are_bijection | | one_hot_encoder set_col_as_numeric | generate_from_character |remove_sd_outlier | | set_col_as_date | fast_round |remove_rare_categorical | | set_col_as_factor | target_encode |remove_percentile_outlier| |
All of those functions are integrated in the full pipeline function prepare_set
.
For more details on how it work go check our tutorial.
Install the package from CRAN:
install.packages("dataPreparation")
To have the latest features, install the package from github:
library(devtools)
install_github("ELToulemonde/dataPreparation")
Load a toy data set
library(dataPreparation)
data(messy_adult)
head(messy_adult)
Perform full pipeline function
clean_adult <- prepare_set(messy_adult)
head(clean_adult)
That's it. For all functions, you can check out documentation and/or tutorial vignette.
dataPreparation has been developed and used by many active community members. Your help is very valuable to make it better for everyone.
For more details, please refer to CONTRIBUTING.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.