knitr::opts_chunk$set(echo = TRUE)
library(wildlifeR) library(ggplot2) library(cowplot) library(ggpubr) library(dplyr)
data(frogarms)
The function make_my_data2L() will extact out a random subset of the data. Change "my.code" to your school email address, minus the "@pitt.edu" or whatever your affiliation is.
my.frogs <- make_my_data2L(dat = frogarms, my.code = "nlb24", # <= change this! cat.var = "sex", n.sample = 20, with.rep = FALSE)
n.sample is set to 20. This is set up to extract 20 unique individuals of each sex. Check that you dataframe is 2*20 = 40 rows using the dim() command.
dim(my.frogs)
dim(my.frogs) nrow(my.frogs) ncol(my.frogs)
head(my.frogs) tail(my.frogs)
names(my.frogs)
?my.frogs
R is a giant calcualter
Whole dataframe
summary(my.frogs)
Just a single column
summary(my.frogs$mass)
Can compare your subset to the original data
summary(my.frogs$mass) summary(frogarms$mass)
Handy trick: stack up the data with rbind()
rbind(summary(my.frogs$mass), summary(frogarms$mass))
mean(my.frogs$mass)
var(my.frogs$mass)
range() returns two values in a vector
range(my.frogs$mass)
Note that R doesn't return a very common statistic, the standard error (SE). This can be calcualted by hand.
sd(my.frogs$mass)/sqrt(length(my.frogs$mass))
Write a function
my_sd1 <- function(dat_column){ sd(dat_column)/sqrt(length(dat_column)) } my_sd2 <- function(dat, column){ sd(dat[,column])/sqrt(length(dat[,column])) } my_sd3 <- function(dat, column, digits.round = 3){ se <- sd(dat[,column])/sqrt(length(dat[,column])) round(se, digits = digits.round) }
my_sd2(dat = my.frogs, column = "mass")
dplyr is a package that provides numerous functions for manipulating data. We will use two handy functions
dplyr can use a handy sytax that involes "pipes". You can string together R commands using the function %>%
When using pipes, you start with a dataframe and follow it with an action you want done to it. So, for example, previously when we wanted the mean of the mass column we did this
mean(myfrogs$mass)
Which is kidn of read like a normal mathematical equation or function, where you start from inside the parentheses and work out. R let's you nest as many functions as you wnat. If i want to round my mean is wrap "mean(myfrogs$mass)" in round(...)
round(mean(myfrogs$mass))
Using pipes to get the mean I write things more like a sentence:
myfrogs$mass %>% mean() #note parentheses.
Which reads kine of like "Take the mass column and the datagrame and apply the mean() function to it." Note that the parentheses have to be included even though there is nothing in them.
To round the mean we would do this
myfrogs$mass %>% mean() %>% round()
Which read left to right like a sentence is "Take the mass column, calcualte the mean and then rond it."
Note that the rond() command has an arguement for how many digits you want to round to. You include that in the parantehes
myfrogs$mass %>% mean() %>% round(digits = 2)
INstead of mean(data$column) we can use summarise()/summarize() and pipes Grand mean of mass
myfrogs %>% summarise(mean(mass))
this is maybe more complicated than "mean(myfrogs$mass)" but overall the pipe framework and summarise pays off when combined with group_b()
For some more info on group_by see
https://www.r-bloggers.com/using-r-quickly-calculating-summary-statistics-with-dplyr/ https://www3.nd.edu/~steve/computing_with_data/24_dplyr/dplyr.html http://www.datacarpentry.org/R-genomics/04-dplyr.html
We can use group_by() to slit things up by a categorical variable. Here, we can say "take myfrogs, split up the data by the sex column, and apply the mean function to each subset."
myfrogs %>% group_by(sex) %>% summarise(mean(mass))
note that the column heading in is mean(mass)
, which is what is in summarise().
A handy thing about sumarise is you can pass it lables. Mean mass by sex w/ label
myfrogs %>% group_by(sex) %>% summarise(mass.mean = mean(mass))
You can lable thigns anything, eg "puppies".
myfrogs %>% group_by(sex) %>% summarise(puppies = mean(mass))
You can pass any summari function to summarise. We can give it sd to get the sd of mass by sex.
myfrogs %>% group_by(sex) %>% summarise(mass.sd = sd(mass))
What makes dplyr::group_by and summarize() really powerful is that you can pass it multiple summary functions at the same time
myfrogs %>% group_by(sex) %>% summarise(mass.mean = mean(mass), mass.sd = sd(mass))
dplyr has a handy function n() for getting your sample size.
myfrogs %>% group_by(sex) %>% summarise(mass.mean = mean(mass), mass.sd = sd(mass), n = n())
Pass it a novel function
myfrogs %>% group_by(sex) %>% summarise(mass.mean = my_sd1(mass))
The doBy package has a nice syntax. I don't really see manhy people use it
library(doBy) summaryBy(mass ~ sex,data = myfrogs, FUN = mean) summaryBy(mass ~ sex,data = myfrogs, FUN = c(mean,sd))
tapply is pretty old school
tapply(X = myfrogs$mass,INDEX = myfrogs$sex, FUN = mean)
What I've used most of my career thus far. Am slowly switch to dplyr.
library(reshape2) dcast(data = myfrogs, formula = sex ~ ., value.var = "mass", fun.aggregate = mean)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.