README.md

eda

EDA is critical to every Data Analytics engagement. The intent is to:

Installation

You can install eda from github with:
install.packages("devtools")
devtools::install_github("gramener/eda")
Next, you load the package with:
library(eda)
To view all the functions that the package contains do:
help(package = eda)
To view the usage of the available functions do:
help(metadata)
help(univariate)
help(bivariate)

Metadata Creation :

Functional capability to read any file format and provide a tabular exportable format containing :
To compute the metadata do:
meta <- metadata$new(path = "Specify path to the file you want to conduct EDA on")
To view the metadata output in the console do:
meta$output()
You can make changes to the contents of the metadata, for example:change the type of the age column from continuous to discrete.
meta$columns$age$type <- "discrete"
To output the metadata in a structured format into excel do the following. If sheet name is not specified by default it will be named "Metadata".
meta$save(savepath = "path to the existing excel file or a new excel file to be created",sheet = "Metadata Analysis")

Univariate Analysis:

Functional capability to read any file format and provide a tabular exportable format containing :
provide a chart containing :

Binning: https://en.wikipedia.org/wiki/Freedman%E2%80%93Diaconis_rule

To conduct univariate analysis do the following.The "k" in the outlier detection technique(mean +/- k* stardard deviation). By default k = 3
uni <- univariate$new(metadata = meta,k = 3)
To view the univariate analysis in the console do:
uni$output()
To output the univariate analysis in a structured format into excel do the following. If sheet name is not specified by default it will be named "Univariate".
uni$save(path = "path to the existing excel file or a new excel file to be created",sheet = "Univariate Analysis")
uni$saveplot(path = "path to the existing excel file or a new excel file to be created")
To output the univariate plots in a structured format into excel do the following. By default the Histogram uses Diaconis Rule to determine the number of breaks. If you want to set breaks manually set the breaks argument.
uni$saveplot(path = "path to the existing excel file or a new excel file to be created")

Bivariate Analysis

Functional capability to read any file format and provide a tabular exportable format containing :

Bivariate Tables for
- Categorical - Categorical Variable : Cross Tab of Count and Proportion of Records 
- Numeric - Categorical Variable : Sum, Average, Min, Max of Records
Bivariate Plots
- Numeric - Categorical Variable : Bar Plot for Sum, Average, Min, Max Records
- Numeric - Numeric : Scatter Plot and Correlation Plot
To conduct univariate analysis do the following.
bi <- bivariate$new(metadata = meta)
To view the bivarate analysis in the console do:
bi$output()
To output the Bivariate tables in a structured format into excel do the following.
bi$save(path = "path to the existing excel file or a new excel file to be created")
To output the Bivariate Plots in a structured format into excel do the following.The method argument is the correlation computation method that can either be pearson or spearman
bi$saveplot(path = "path to the existing excel file or a new excel file to be created",method = "pearson")

Example

This is a basic example which shows you how to solve a common problem:

##Install the eda package
install_github("gramener/eda")

##Load the eda package
library(eda)

##To compute the metadata for the iris dataset do:
meta <- metadata$new(data = iris)

##To view the metadata output onto the console:
meta$output()

##To save the metadata output into a xlsx file:
meta$save(savepath = "C:/Users/Admin/Desktop/Output.xlsx")

##To compute the univariate analysis do:
uni <- univariate$new(metadata = meta)

##To view the univariate analysis onto the console:
uni$output()

##To save the univariate analysis into a xlsx file do:
uni$save(savepath = "C:/Users/Admin/Desktop/Output.xlsx")

##To save the univariate plots into a xlsx file do:
uni$saveplot(savepath = "C:/Users/Admin/Desktop/Output.xlsx")

##To compute the bivariate analysis do:
bi <- bivariate$new(metadata = meta)

##To view the bivariate analysis onto the console:
bi$output()

##To save the bivariate analysis into a xlsx file do:
bi$save(savepath = "C:/Users/Admin/Desktop/Output.xlsx")

##To save the bivariate plots into a xlsx file do:
bi$saveplot(savepath = "C:/Users/Admin/Desktop/Output.xlsx",method = "pearson")


nolancardozo13/eda documentation built on May 12, 2019, 8:47 a.m.