Contributors: Constantin Shuster, Sylvia Lee, Richie Zitomer
This is an R package that tests your data for normality!
A common and important assumption that is made by many (and commonly used) parametric statistical methods (t-tests, ANOVA and linear regression) is that the dependent variable (response variable) is normally distributed across all categories of the independent variables (predictors). Thus testing for normality in the data is an important step before applying parametric statistical methods.
Graphical and statistical methods can be used to test whether sample data was drawn from a normal population. In normality testing it is important to remember that our null hypothesis is that the sample data is NOT different than a normal population with the same mean and variance. If we fail to reject this null hypothesis - meaning resultant p-value is > 0.05 - then we would be able to apply the appropriate parametric statistical methods to our data. Normality testing can also be used to check whether any sample data approximates a normally distributed population. More on this topic can be found here and here.
This package will test your data for normality using a graphical and a statistical method. As a graphical method, this package lets you see a quantile-quantile plots (Q-Q plot) in order for you to visualize whether the data closely approximates a straight line - thereby indicating it is normally distributed. As a statistical method, this package lets you calculate the Shapiro-Wilk test score along with the corresponding p-value. The Shapiro-Wilk test provides better power than most other statistical normality tests, as long as most of the values are unique, see here for more information. This package will also derive the parameters that would fit your data to a normal distribution using maximum likelihood estimation.
library(noRmtest)
data <- c(1,2,3,20) # Very simple example of a very non-normal dataset
(params_mle(data)) # Returns a dataframe with the MLE mean and variance if the data was normal
(shapiro_wilk(data)) # Returns a list with the Shapiro statistic and the p.value
(make_qqplot(data)) # Returns a Q-Q plot to check for normality
In order to install the package, first make sure you have devtools installed and loaded by running (in an R environment):
install.packages("devtools", build_vignettes = TRUE)
library(devtools) # load devtools
Then run the following command to install our package:
devtools::install_github("UBC-MDS/noRmtest", build_opts = c("--no-resave-data", "--no-manual"))
Then import the functions that you need.
The package has the following dependencies:
make_qqplot()
shapiro_wilk()
params_mle()
We have tests to ensure that are package is working as expected and will continue to do so into the future. As you can see, we have close to full branch coverage:
In the stats
package in R there is a Shaprio-Wilk test function named shapiro.test()
. The input is a numeric vector and the output is list containing the statistic value and the p-value. There is a package called ggpubr
which has a ggqqplot()
where the input is a dataframe and the output is a ggplot2
object. Interestingly, the car
package also has a qqPlot()
function which can be used to make a Q-Q plot of any data. Additionally, the stats4
package provides mle()
which performs maximum likelihood estimation. However this function may be less intuitive as it requires users to define the likelihood function.
noRmtest
is designed to centralize operations that are related to normality assumption testing and parameter estimation. Although all of functions in this noRmtest
package may be acquired elsewhere, this package is designed to eliminate the hassle to import functions from different packages in which the users may or may not have.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.