knitr::opts_chunk$set( collapse = TRUE, comment = "ws#>", warning = FALSE, message = FALSE ) # directory to Lab dirdl <- system.file("Lab2",package = "Intro2R") # create rmd link library(Intro2R) dir(dirdl)
r rmdfile("Lab 2","MATH4753Laboratory2.docx","Lab2")
will cement a number of R skills and theory related to the following:
For the empirical rule you will need to be able to calculate means and standard deviations.
You can make z values in a number of different ways.
Suppose we wish to make Z values from the DDT variable in the ddt
data frame.
d <- ddt$DDT z <- (d-mean(d))/sd(d) head(z)
We can use a built in function called scale()
zmat<-scale(ddt$DDT) # scale makes a matrix of z values z<-zmat[,1] # take the column to form a vector head(z)
To show how many data values lie within k
standard deviations of the mean you will need to count the number of values in an interval
mn <- mean(ddt$DDT) sdd <- sd(ddt$DDT) mp<-c(-1,1) k<-2 mn+mp*k*sdd
We can subset the data frame with abs(z)<2
and then pull off the DDT column.
The function length
adds up the number of values in the vector
ddl2<-ddt[abs(z)<2,"DDT"] length(ddl2)/length(ddt$DDT)*100
98.6% of the DDT lies within 2 standard deviations of the mean.
The dotplot is a simple device, we can set bins of a certain size (I will use 1/5 sd(LENGTH)) to break the continuous data into discrete levels.
In addition we will use the cut()
function to create labels corresponding to regions of LENGTH that are distant from the mean by integral standard deviations.
See the code below:
library(ggplot2) library(dplyr) mn <- mean(ddt$LENGTH) sdd <- sd(ddt$LENGTH) ddt <- ddt %>% mutate( z = (LENGTH-mn)/sdd, Far = ifelse(abs(z)> 3, "Outlier", ifelse(abs(z)>=2 & abs(z)<=3, "Posiible Out.", "MAIN"))) g <- ggplot(ddt, aes(x = LENGTH)) + geom_dotplot(aes(fill = Far),binwidth = 1/5*sdd) g <- g + geom_density(aes(y = ..count..)) g <- g + labs(title = "LENGTH data categorized by outlier status using z") g
Or we could just cut the LENGTH variable as below:
library(ggplot2) library(dplyr) mn <- mean(ddt$LENGTH) sdd <- sd(ddt$LENGTH) ddt <- ddt %>% mutate( Lcut = cut(LENGTH, c(min(LENGTH)-1, seq(mn-3*sdd, mn+3*sdd, by = sdd), max(LENGTH)), labels = c("Outlier","Possible Out","Main","Main", "Main","Main", "Possible Out", "Outlier"))) g <- ggplot(ddt, aes(x = LENGTH)) + geom_dotplot(aes(fill = Lcut),binwidth = 1/5*sdd) g <- g + geom_density(aes(y = ..count..)) g <- g + labs(title = "LENGTH data categorized by outlier status") g
You will need the following files
dir(dirdl)
Make sure you place them all in Lab2.
r rmdfile("EPAGAS", "EPAGAS.XLS","Lab2")
r rmdfile("Lab2.R", "Lab2.R", "Lab2")
r rmdfile("Lab document","MATH4753Laboratory2.docx","Lab2")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.