knitr::opts_chunk$set(echo = TRUE)
getwd()
# Holds the entire data set mpg.df <- read.csv("EPAGAS.csv") # Holds the first 6 lines of the data set mpg_6 <- head(mpg.df, 6) mpg_6
mpg <- mpg.df$MPG head(mpg)
z <- (mpg - mean(mpg))/sd(mpg) head(z)
print(paste0("z-bar = ", round(mean(z), digits=4)), quote = FALSE)
print(paste0("variance = ", var(z)), quote = FALSE)
mpg[abs(z) >= 2 & abs(z) <= 3]
mpg[abs(z) > 3]
library(lattice) cols = ifelse(abs(z) > 3,"Red", ifelse(abs(z) >= 2 & abs(z),"Blue","Black")) lattice::dotplot(mpg, col= cols)
boxplot(mpg, notch = T, horizontal = T, col= "Black", main="MPG Boxplot")
According to Chebyshev's, if k = 2 then $\frac{N(S_k)}{n} = 1 - \frac{1}{k^2} = 1 - \frac{1}{4} = \frac{3}{4}$ or $75\%$ of data will lie within 2 Std deviations
x <- mpg[abs(z) <= 2] print(paste0(length(x),"% of data is within 2 std deviations"))
According to Chebyshev's inequality
k <- 2 mean(abs(z - mean(z)) >= 2*sd(z)) <= 1/k^2
Yes.
According to the empirical rule, at least 95% of data should be withing 2 std deviations
We saw earlier that 96% of data in EPAGAS.csv is within 2 std deviations, so it corresponds well.
The rule makes two assumptions: 1) The distribution is unimodal 2) Symmetrical distrubtion about the mode
Yes. Although there is a slight skew value 0.0499, which is generally acceptable for a normal distribution, I think both of the assumptions hold.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.