knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Quick and easy way to clean data and build exploratory data analysis plots.
This idea came up as we have been building data projects for quite some time now in the UBC MDS program. We noticed that there are some repetitive activities that occur when we first explore the data. This project will help you take a given raw data set an conduct some data cleansing and plotting with a minimal amount of code.
The main components of this package are:
To get started with instaeda, we load the library and some sample data (palmerpenguins) to showcase this package with:
library(instaeda) library(palmerpenguins)
Let's try each function by the main components of this package.
First we will use the penguins data set as an example data frame to be used in the examples.
input_df <- palmerpenguins::penguins head(input_df)
Data Checking
With the function plot_info() you can generate a plot with a basic summary metrics of the data such as the distribution of numeric columns, factor columns , complete rows and missing observations.
plot_info(input_df)
Data Cleansing
With the function divide_and_fill(), you can impute missing values in numerical columns. You can fill the the missing values with:
mean
median
random
In addition, you can also shuffle the data frame.
divide_and_fill(input_df, strategy='median', random=TRUE)
Exploratory Visualization
With the function plot_corr(), you can generate a correlation plot on numerical columns with one of the following correlation methods:
pearson
kendall
spearman
plot_corr(input_df, method='pearson')
With the function plot_basic_distributions(), you can generate basic distribution plots for factor, character and/or numeric columns and access each plot in a named list.
plot_basic_distributions(input_df)
We hope this package will help you with your initial exploratory analysis in your projects.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.