Exploratory Data Analysis Report"

library(rmarkdown)
library(SmartEDA)
library(knitr)
library(scales)
library(gridExtra)
library(ggplot2)

data <- params$data

Exploratory Data analysis (EDA)

Analyzing the data sets to summarize their main characteristics of variables, often with visual graphs, without using a statistical model.

1. Overview of the data

Understanding the dimensions of the dataset, variable names, overall missing summary and data types of each variables

# Overview of the data
ExpData(data=data,type=1)
# Structure of the data
ExpData(data=data,type=2)
ovw_tabl <- ExpData(data=data,type=1)
ovw_tab2 <- ExpData(data=data,type=2)

Overview of the data

paged_table(ovw_tabl)

Structure of the data

paged_table(ovw_tab2)

Target variable

Summary of continuous dependent variable

  1. Variable name - r Target
  2. Variable description - r label
summary(data[,Target])

2. Summary of numerical variables

snv_2 = ExpNumStat(data,by="GA",gp=Target,Qnt=seq(0,1,0.1),MesofShape=2,Outlier=TRUE,round=2)
rownames(snv_2)<-NULL

Summary statistics when dependent variable is Continuous r Target.

ExpNumStat(data,by="A",gp=Target,Qnt=seq(0,1,0.1),MesofShape=2,Outlier=TRUE,round=2)
paged_table(snv_2)

3. Distributions of numerical variables

Graphical representation of all numeric features, used below types of plots to explore the data

Quantile-quantile plot for Numerical variables - Univariate

Quantile-quantile plot for all Numerical variables

ExpOutQQ(data,nlim=4,fname=NULL,Page=c(2,2),sample=sn)

Density plots for numerical variables - Univariate

Density plot for all numerical variables

ExpNumViz(data,target=NULL,nlim=10,fname=NULL,col=NULL,theme=theme,Page=c(2,2),sample=sn)

Scatter plot for all Numeric variables

Scatter plot between all numeric variables and target variable r Target. This plot help to examine how well a target variable is correlated with list of dependent variables in the data set.

ExpNumViz(data,target=NULL,nlim=5,Page=c(2,1),theme=theme,sample=sn,scatter=TRUE)

Correlation between dependent variable vs Independent variables

Dependent variable is r Target (continuous).

ExpNumViz(data,target=Target,nlim=5,fname=NULL,col=NULL,theme=theme,Page=c(2,2),sample=sn)

** Correlation summary table

snv_22 = ExpNumStat(data,by="GA",gp=Target,MesofShape=2,Outlier=FALSE,round=2,dcast=T,val="cor")
rownames(snv_22)<-NULL
ExpNumStat(data,by="GA",gp=Target,MesofShape=2,Outlier=FALSE,round=2,dcast=T,val="cor")
paged_table(snv_22)

4. Summary of categorical variables

Summary of categorical variables

et1 <- ExpCTable(data,Target=NULL,margin=1,clim=10,nlim=5,round=2,per=T)
rownames(et1)<-NULL
et11 <- ExpCTable(data,Target=Target,margin=1,clim=10,nlim=5,round=2,bin=4,per=T)
rownames(et11)<-NULL
ExpCTable(data,margin=1,clim=10,nlim=5,round=2,per=T)
paged_table(et1)
##bin=4, descretized 4 categories based on quantiles
ExpCTable(data,Target=Target,margin=1,clim=10,nlim=5,round=2,bin=4,per=T)
paged_table(et11)

5. Distributions of Categorical variables

Graphical representation of all Categorical variables

Bar plot with vertical or horizontal bars for all categorical variables

ExpCatViz(data,clim=10,margin=2,theme=theme,Page = c(2,2),sample=sc)


Try the SmartEDA package in your browser

Any scripts or data that you put into this service are public.

SmartEDA documentation built on Dec. 4, 2022, 1:15 a.m.