knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The laundRy
package performs many standard preprocessing techniques for Tidyverse tibbles, before use in statistical analysis and machine learning. The package functionality includes categorizing column types, handling missing data and imputation, transforming/standardizing columns and feature selection. The laundRy
package aims to remove much of the grunt work in the typical data science workflow, allowing the analyst maximum time and energy to devote to modelling!
You can install the released version of laundRy from CRAN with:
install.packages("laundRy")
And the development version from GitHub with:
# install.packages("devtools") devtools::install_github("UBC-MDS/laundRy")
categorize
: This function will take in a dataframe, and output a list of lists with column types as list labels (numerical, categorical, text), and each list containing the column names associated with each column type.
fill_missing
: This function takes in a dataframe and depending on user input, will either remove all rows with missing values, or will fill missing values using mean
, median
, or regression
imputation.
transform_columns
: This function will take in a dataframe and apply pre-processing techniques to each column. Categorical columns will be transformed with a One Hot Encoding and numerical columns will be scaled.
feature_selector
: This function takes in a dataframe which has X and y columns specified, a target task (Regression or Classification), and a maximum number of features to select. The function returns the most important features for the target task.
mice offers similar functionality for the fill_missing function, but is not integrated with a column categorizer.
The main feature selection and preprocessing package in R is caret, which carries out similar functionality to our feature_selector
function though laundRy makes the workflow more efficient and adds imputation.
As far as we know, there are no similar packages for Categorizing Columns and providing a list of the categorized columns. laundRy
is the first package we are aware of to abstract away the full dataframe pre-processing workflow with a unified and simple API.
This is a basic example which shows you how to solve a common problem:
library(laundRy) ## basic example code
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.