README.md

dataPreparation

Github actions codecov CRAN_Status_Badge HitCount

Data preparation accounts for about 80% of the work during a data science project. Let's take that number down. dataPreparation will allow you to do most of the painful data preparation for a data science project with a minimum amount of code.

This package is - fast (use data.table and exponential search) - RAM efficient (perform operations by reference and column-wise to avoid copying data) - stable (most exceptions are handled) - verbose (log a lot)

Main preparation steps

Before using any machine learning (ML) algorithm, one need to prepare its data. Preparing a data set for a data science project can be long and tricky. The main steps are the followings:

Here are the functions available in this package to tackle those issues:

Correct | Transform | Filter | Pre model manipulation| Shape --------- |----------- |-------- |----------- |------------------------ un_factor | generate_date_diffs | fast_filter_variables | fast_handle_na | shape_set find_and_transform_dates | generate_factor_from_date | which_are_constant | fast_discretization | same_shape find_and_transform_numerics | aggregate_by_key | which_are_in_double | fast_scale | set_as_numeric_matrix set_col_as_character | generate_from_factor | which_are_bijection | | one_hot_encoder set_col_as_numeric | generate_from_character |remove_sd_outlier | | set_col_as_date | fast_round |remove_rare_categorical | | set_col_as_factor | target_encode |remove_percentile_outlier| |

All of those functions are integrated in the full pipeline function prepare_set.

For more details on how it work go check our tutorial.

Getting started: 30 seconds to dataPreparation

Installation

Install the package from CRAN:

install.packages("dataPreparation")

To have the latest features, install the package from github:

library(devtools)
install_github("ELToulemonde/dataPreparation")

Test it

Load a toy data set

library(dataPreparation)
data(messy_adult)
head(messy_adult)

Perform full pipeline function

clean_adult <- prepare_set(messy_adult)
head(clean_adult)

That's it. For all functions, you can check out documentation and/or tutorial vignette.

How to Contribute

dataPreparation has been developed and used by many active community members. Your help is very valuable to make it better for everyone.

For more details, please refer to CONTRIBUTING.



ELToulemonde/dataPreparation documentation built on July 19, 2023, 11:45 a.m.