Uni-variate and Bi-variate analysis of data
The package can be installed from github using devtools:
devtools::install_github("SyedAkbarAhmad/AllSigTest")
The purpose of this package is to automatically run and view:
For the purpose of demo I will use the Cars93 dataset from the MARS package. The Cars93 dataset describes data from 93 Cars on Sale in the USA in 1993. I am interested in finding the factors contributing towards the Price of a vehicle.
library(MARS)
The dependent packages:
library(corrplot)
library(gplots)
First I will remove variables that won't contribute towards this analysis (Such as model name & manufacturer name):
Cars93$Manufacturer <- NULL
Cars93$Model <- NULL
Cars93$Make <- NULL
Now, to run the package provide dataset name and the name of the variable we wish to the run the tests on:
AllSigTest(Cars93,"Price")
The output is divided into two sections:
Histograms are plotted for all numerical variables:
Example
Bar Plots are plotted for all character/factor variables:
Example:
A correlation plot is plotted for all the numeric variables:
Depending on the data type of the target variables different tests are executed. In this example, the target variable is Price which is a numeric variable. The analysis applicable are: ANOVA & correlation:
Example for scatter plots:
Price had a strong positive correlation with the weight of the vehicle. Heavier vehicles are priced higher.
Price had a strong negative correlation with the MPG of the vehicle. Cars with better fuel economy are cheaper.
Price had a strong positive correlation with the HP of the vehicle. Cars with higher HP are more expensive.
Example for ANOVA:
We can reject the null hypothesis (with 95% confidence) that airbags have no impact on the price. Cars with no airbags cost lesser.
We can reject the null hypothesis (with 95% confidence) that transmission type has no impact on the price. Cars with no manual transmission cost lesser.
We can NOT reject the null hypothesis (with 95% confidence) that country of origin has no impact on the price.
We can run similar analysis on categorical target. In this example, I am running the analysis on Origin:
AllSigTest(Cars93,"Origin")
Apart from ANOVA, the package will also perform chi-square test for categorical variables:
Example:
We can reject the null hypothesis (with 95% confidence) that transmission type is independent of the Origin. Non-USA countries produce higher number of cars with manual transmission.
We can reject the null hypothesis (with 95% confidence) that car type is independent of the Origin. USA produces higher number of large cars and non-USA countries produce higher number of smaller cars.
We can not reject the null hypothesis (with 95% confidence) that airbag is independent of the Origin.
All feedback is appreciated.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.