univariate: Univariate analysis of variables

Description Usage Arguments Value Author(s) Examples

View source: R/functions.R

Description

The function gives univariate analysis of the variables as output dataframe. The univariate statistics includes - minimum, maximum, mean, median, number of distinct values, variable type, counts of null value, percentage of null value, maximum population percentage among all classes/values, correlation with target. It also returns the list of names of character and numerical variable types along with variable name with population concentration more than a threshold at a class/value.

Usage

1
univariate(base, target, threshold)

Arguments

base

input dataframe

target

column / field name for the target variable to be passed as string (must be 0/1 type)

threshold

sparsity threshold, to be provided as decimal/fraction

Value

The function returns an object of class "univariate" which is a list containing the following components:

univar_table

univariate summary of variables

num_var_name

array of column names of numerical type variables

char_var_name

array of column names of categorical type variables

sparse_var_name

array of column names where population concentration at a class or value is more then the sparsity threshold

Author(s)

Arya Poddar <aryapoddar290990@gmail.com>

Examples

1
2
3
4
5
6
7
8
9
data <- iris
data$Species <- as.character(data$Species)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)

univariate_list <- univariate(base = data,target = "Y",threshold = 0.95)
univariate_list$univar_table
univariate_list$num_var_name
univariate_list$char_var_name
univariate_list$sparse_var_name

Example output

           var var_min var_max     mean median var_vals      type count_missing
1 Sepal.Length     4.3     7.9 5.843333   5.80       35   numeric             0
2  Sepal.Width     2.0     4.4 3.057333   3.00       23   numeric             0
3 Petal.Length     1.0     6.9 3.758000   4.35       43   numeric             0
4  Petal.Width     0.1     2.5 1.199333   1.30       22   numeric             0
5      Species      NA      NA       NA     NA        3 character             0
  perc_missing max_pop_conc        corr
1            0   0.06666667  0.00124037
2            0   0.17333333  0.02061405
3            0   0.08666667 -0.04615706
4            0   0.19333333 -0.06762063
5            0   0.33333333          NA
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
[1] "Species"
character(0)

scorecardModelUtils documentation built on May 2, 2019, 9:59 a.m.