knitr::opts_chunk$set()
A common and important assumption that is made by many (and commonly used) parametric statistical methods (t-tests, ANOVA and linear regression) is that the dependent variable (response variable) is normally distributed across all categories of the independent variables (predictors). Thus testing for normality in the data is an important step before applying parametric statistical methods.
Graphical and statistical methods can be used to test whether sample data was drawn from a normal population. In normality testing it is important to remember that our null hypothesis is that the sample data is NOT different than a normal population with the same mean and variance. If we fail to reject this null hypothesis - meaning resultant p-value is > 0.05 - then we would be able to apply the appropriate parametric statistical methods to our data. Normality testing can also be used to check whether any sample data approximates a normally distributed population. More on this topic can be found here and here.
noRmtest is a package that tests your data for normality using a graphical and a statistical method on multiple variables at once. As a graphical method, quantile-quantile plots (Q-Q plot) will be constructed in order for you to visualize whether the data closely approximates a straight line - thereby indicating it is normally distributed. As a statistical method, the Shapiro-Wilk test score will be calculated along with the corresponding p-value. The Shapiro-Wilk test provides better power than most other statistical normality tests, as long as most of the values are unique, see here for more information. This package will also derive the parameters that would fit your data to a normal distribution using maximum likelihood estimation.
make_qqplot()
shapiro_wilk()
params_mle()
Consider the iris
dataset with the variable Sepal.Length
and Petal.Length
suppressPackageStartupMessages(library(tidyverse)) library(noRmtest) simple_iris <- iris %>% select(Sepal.Length, Petal.Length)
head(simple_iris)
One explorative analysis that one might consider for the dataset is the QQ-plot. make_qqplot
is a function that can simultaneously compute construct the QQ-plot for multiple variables.
make_qqplot(simple_iris)
Similarly, shapiro_wilk
can provide insight to normality quantitatively.
shapiro_wilk(simple_iris)
Once normality could be assumed for both variables, params_mle
can be used to estimate the parameters of the normal distribution that the variables were sampled from.
params_mle(simple_iris)
All functions in noRmtest takes the same input datatypes. As convention, if the input datatype is a dataframe, it should follow the tidy format (see tidyr vignette for more information). Similar to dataframes, variables should be defined as columns in a matrix. A limitation to note is that this package is only limited to numeric data, and is incompatible with boolean and categorical data (characters and factor data).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.