README.md

AllSigTest

Uni-variate and Bi-variate analysis of data

Install

The package can be installed from github using devtools:

devtools::install_github("SyedAkbarAhmad/AllSigTest")

Usage

The purpose of this package is to automatically run and view:

  1. Univariate distribution of all variables in a dataset (Histograms & Bar Plots).
  2. Tests of significance on the dataset for a particular target variable.

Example

Numeric target

For the purpose of demo I will use the Cars93 dataset from the MARS package. The Cars93 dataset describes data from 93 Cars on Sale in the USA in 1993. I am interested in finding the factors contributing towards the Price of a vehicle.

library(MARS)

The dependent packages:

library(corrplot)
library(gplots)

First I will remove variables that won't contribute towards this analysis (Such as model name & manufacturer name):

Cars93$Manufacturer <- NULL
Cars93$Model  <- NULL
Cars93$Make <- NULL

Now, to run the package provide dataset name and the name of the variable we wish to the run the tests on:

AllSigTest(Cars93,"Price")

The output is divided into two sections:

Univariate Analysis

GitHub Logo

Histograms are plotted for all numerical variables:

Example

GitHub Logo

GitHub Logo

Bar Plots are plotted for all character/factor variables:

Example:

GitHub Logo

GitHub Logo

A correlation plot is plotted for all the numeric variables:

GitHub Logo

Bivariate Analysis

GitHub Logo

Depending on the data type of the target variables different tests are executed. In this example, the target variable is Price which is a numeric variable. The analysis applicable are: ANOVA & correlation:

Example for scatter plots:

Price had a strong positive correlation with the weight of the vehicle. Heavier vehicles are priced higher. GitHub Logo

Price had a strong negative correlation with the MPG of the vehicle. Cars with better fuel economy are cheaper. GitHub Logo

Price had a strong positive correlation with the HP of the vehicle. Cars with higher HP are more expensive. GitHub Logo

Example for ANOVA:

We can reject the null hypothesis (with 95% confidence) that airbags have no impact on the price. Cars with no airbags cost lesser. GitHub Logo

We can reject the null hypothesis (with 95% confidence) that transmission type has no impact on the price. Cars with no manual transmission cost lesser. GitHub Logo

We can NOT reject the null hypothesis (with 95% confidence) that country of origin has no impact on the price. GitHub Logo

Categorical target

We can run similar analysis on categorical target. In this example, I am running the analysis on Origin:

AllSigTest(Cars93,"Origin")

Apart from ANOVA, the package will also perform chi-square test for categorical variables:

Example:

We can reject the null hypothesis (with 95% confidence) that transmission type is independent of the Origin. Non-USA countries produce higher number of cars with manual transmission. GitHub Logo

We can reject the null hypothesis (with 95% confidence) that car type is independent of the Origin. USA produces higher number of large cars and non-USA countries produce higher number of smaller cars. GitHub Logo

We can not reject the null hypothesis (with 95% confidence) that airbag is independent of the Origin.

GitHub Logo

Contributing

All feedback is appreciated.



SyedAkbarAhmad/AllSigTest documentation built on May 9, 2019, 3:27 p.m.