knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(projectPackage)
The goal of projectPackage is to provide simplified functions to achieve the goal of data preprocessing for exploratory data analyses. The projectPackage makes the following steps quick and easy:
This document introduces you all of the tools mentioned above as well as examples of how to apply them to data frames and use them in different real-world scenarios.
mtcarsFor the purpose of demonstration, we will use a basic data set called mtcars. This data set has 11 columns, 32 rows and is documented in ?mtcars. Note that mtcars is a data frame and below is the first 6 rows.
head(mtcars) data <- mtcars
data_cleaning()Often when you work with large data sets, there will be columns that are redundant or not of interest. In that case, you can use our function data_cleaning() to drop those columns. For details on how to use the function, check out help(data_cleaning).
mpg from mtcarshead(data_cleaning(data, "mpg"))
As you can see above, the column mpg has been dropped from the data frame.
head(data_cleaning(data, c("mpg","disp","qsec")))
Again, columns mpg, disp, and qsec have been dropped.
For instance, if I accidentally used the wrong column name mpf instead of mpg. The function will kindly produce an error.
data_cleaning(data, "mpf") will produce the following message: "Error in data_cleaning(data, "mpf") : column name does not exist in the data frame"
correlation_graph()Often times visualizations are needed to better understand the raw/modified data. Data visualization helps in the breakdown of complex problems by transforming data into a more understandable format and showing trends and outliers. A good visualization tells a story by reducing noise from data and emphasizing the most important facts.
We know that GGally's ggpairs() function can create a correlation matrix for us, however, the plot can be hard to read if there is a lot of data. Additionally, the plot can look a bit boring -- lacking colour. The correlation_graph() function solves those problems for us.
For details on how to use the function, check out help(correlation_graph).
correlation_graph(data[1:3])
recipe_scale_center()If we want to make a model, we have to first create a recipe specifying a formula and any additional steps we want to perform; and, more often than not, we want to scale and center the data. So recipe_scale_center() creates a recipe and also scales and centers all the predictors in the given formula.
For example, we can create a recipe using mpg has the target and hp and cyl as the predictors.
recipe_scale_center(data, mpg ~ hp + cyl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.