knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Quick and easy way to clean data and build exploratory data analysis plots.

This idea came up as we have been building data projects for quite some time now in the UBC MDS program. We noticed that there are some repetitive activities that occur when we first explore the data. This project will help you take a given raw data set an conduct some data cleansing and plotting with a minimal amount of code.

The main components of this package are:

To get started with instaeda, we load the library and some sample data (palmerpenguins) to showcase this package with:

library(instaeda)
library(palmerpenguins)

Let's try each function by the main components of this package.

First we will use the penguins data set as an example data frame to be used in the examples.

input_df <- palmerpenguins::penguins
head(input_df)

Data Checking

With the function plot_info() you can generate a plot with a basic summary metrics of the data such as the distribution of numeric columns, factor columns , complete rows and missing observations.

plot_info(input_df)

Data Cleansing

With the function divide_and_fill(), you can impute missing values in numerical columns. You can fill the the missing values with:

In addition, you can also shuffle the data frame.

divide_and_fill(input_df, strategy='median', random=TRUE)

Exploratory Visualization

With the function plot_corr(), you can generate a correlation plot on numerical columns with one of the following correlation methods:

plot_corr(input_df, method='pearson')

With the function plot_basic_distributions(), you can generate basic distribution plots for factor, character and/or numeric columns and access each plot in a named list.

plot_basic_distributions(input_df)

We hope this package will help you with your initial exploratory analysis in your projects.



UBC-MDS/instaeda_R documentation built on March 29, 2021, 7:55 a.m.