tests_univariate: Perform a Univariate Analysis on a given dataset

Description Usage Arguments Value See Also Examples

Description

This function performs univariate analysis on a given dataset and a specifed response variable. The dataset can be composed of mixed data types. The function models the response variable against each of the predictor variables using an appropriate glm or non-parametric model. Currently the function supports Linear Regression, Logistic Regression, ANOVA, Loglinear Regression, Poisson Regression, Gamma Regression and Decision Trees. The results of the T-Test/ F-test / Deviance Test for overall significance or variable importance are returned as a data frame. This data frame can be exported as a .csv to a specified directory. The null hypothesis for the glm models is that the predictor is not significant in explaining the response. The variable importance for decision trees is calculated by modelling the predictor variale as a root nodel and calculating the information gain with respect to the response variable.

Usage

1
2
3
tests_univariate(dataset, type = c("Linear", "Logistic", "ANOVA", "Loglinear",
  "Poisson", "Gamma", "Decision Tree"), response = NULL, method = c(NULL,
  "class", "anova", "poisson", "exp"), file_name = NULL, directory = NULL)

Arguments

dataset

The dataset on which the univariate analysis is performed.

type

The type of univariate / GLM analysis to perform. Linear Regression should be performed for a numeric predictor and numeric response. Logistic Regression should be performed for a numeric predictor and a binary numeric response. ANOVa should be performed for a categorical predictor and a numeric response. Loglinear Regression should be performed for a categorical predictor and a categorical response. Poisson Regression should be performed for a numeric predictor and a numeric count response. Gamma Regression should be performed for a numeric predictor and a numeric gamma response. Decision Tree should be performed for either a numeric or categoricl predictor and a numeric or categorical response.

response

The name of the response variable in the dataset.

method

A character object indicating the method of modelling, one of 'class', 'anova', 'poisson' or 'exp'. To be used with Decision Tree type models. The default is NULL. See rpart package for further details.

file_name

A character object indicating the file name when saving the data frame. The default is NULL. Note, the file name must include the .csv suffix.

directory

A character object specifying the directory where the data frame is to be saved as a .csv file.

Value

Outputs The results of the T-Test/ F-test / Deviance Test for overall significance are returned as a data frame.

See Also

tests_chisq, tests_ks, tests_norm, tests_proptest, tests_t, tests_var, tests_wilcoxon

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#-- Linear Regression --#

# For numeric perdictor and numeric response
norm = rnorm(n = 150, sd = 10, mean = 75)
data = cbind(norm, iris)
tests_univariate(response = "norm", dataset = data, type = "Linear")

# Lung Capacity Example
tests_univariate(response = "Age", dataset = lungcap, type = "Linear")

#-- Logistic Regression --#

# For numeric perdictor and binary numeric response
binary = sample(c(0,1), size = 150, replace = TRUE)
data = cbind(binary, iris)
tests_univariate(response = "binary", dataset = data, type = "Logistic")

# Lung Capacity Example
data = lungcap
data['Male'] = as.integer(data$Gender == 'male')
tests_univariate(response = "Male", dataset = data, type = "Logistic")

#-- ANOVA --#

# For categorical predictor and numeric response
cat = sample(c("A", "B", "C"), size = 150, replace = TRUE)
data = cbind(cat, iris)
tests_univariate(response = "Sepal.Width", dataset = data, type = "ANOVA")

# Lung Capacity Example
tests_univariate(response = "Age", dataset = lungcap, type = "ANOVA")

#-- Loglinear Regression --#

# For categorical predictor and categorical response
cat = sample(c("A", "B", "C"), size = 150, replace = TRUE)
data = cbind(cat, iris)
tests_univariate(dataset = data, response = "cat", type = "Loglinear")

# Lung Capacity Example
tests_univariate(response = "Gender", dataset = lungcap, type = "Loglinear")

#-- Poisson Regression --#

# For numeric predictor and numeric count response
count = rpois(n = 150, lambda = 3)
data = cbind(count, iris)
tests_univariate(dataset = data, response = "count", type = "Poisson")

#-- Gamma Regression --#

# For numeric predictor and numeric gamma distributed response
gamma = rgamma(n = 150, shape = 3)
data = cbind(gamma, iris)
tests_univariate(dataset = data, response = "gamma", type = "Gamma")

#-- Decision Tree --#

# Titanic Decision
# For numeric response and mixed predictors
tests_univariate(dataset = lungcap, response = "Age", type = "Decision Tree", method = "anova")
# For Categorical response and mixed predictors
tests_univariate(dataset = lungcap, response = "Gender", type = "Decision Tree", method = "class")

oislen/BuenaVista documentation built on May 16, 2019, 8:12 p.m.