In ChiJingLeow/R_Package_Leow: An R package for gene expression analyses

Due Nov. 12 at Midnight.

Complete this activity in your R_package_lastname project.

Please have a look at this paper. It explains the relationship between air temperature, butterfly emergence, and plant flowering.

Phenology

Phenology is the study of when events happen in organisms' life cycles. For example, the first flowering of a plant for the year, or the emergence of forager bees. Typically, organisms take cues from their environment. Commonly, these are changes such as photoperiod (how much light there is per day) or temperature.

When organisms that rely on one another fall out of sync, it can be a problem. For example, if a plant starts to flower when the soil is warm enough, this will often occur after a period of warmer air temperatures because soil has a higher heat capacity than air. What happens to the bees when they emerge and don't find any food? This is called ecological mismatch. Originally described in 2004 by Winder and Schindler, ecological mismatch can mean that one or both species don't have their needs met by the environment.

What we're going to do today is look at some historical and present-day ecological data. What were the temperatures historically? And when did butterflies erupt from their cocoons? Butterflies need to be warm to incubate, exit the coccoon and go into flight.

Before trying the test, make sure you've read the Kharouba and Vellend paper and understand the hypotheses they were testing.

Project Two

Part One. 55 points.

download.file(url = "https://raw.githubusercontent.com/Paleantology/GBIO153H/master/data/Butterfly_data.csv", destfile = "/cloud/project/Data/Butterfly_data.csv")
download.file(url = "https://raw.githubusercontent.com/Paleantology/GBIO153H/master/data/Phenology_data.csv", destfile = "/cloud/project/Data/Phenology_data.csv")

Next, let's read in both the butterfly and phenology datasets. 5 pts.

Butterfly_data <- read_csv("/cloud/project/Data/Butterfly_data.csv")
Phenology_data <- read_csv("/cloud/project/Data/Phenology_data.csv")

1) How many unique species of butterly are in the dataset? (5pts)

unique_species <- Butterfly_data %>% 
group_by(ButterflySpecies)
count(unique_species)

#CJL: There are 12 unique species in the dataset.

2) Check out the relationship between temperature and time. First, let's plot it. Choose an approriate plot type, and plot the year vs. temperature. (5 pts)

ggplot(data = Phenology_data, mapping = aes(x = Year, y = AnnualTemp)) +geom_point()

It looks like there might be a relationship between these two things. We'll check this with a linear regression. Use stat_smooth to add a regression to the plot. (5 pts)

ggplot(data = Phenology_data, mapping = aes(x = Year, y = AnnualTemp)) +geom_point() +stat_smooth(method = "lm")

Let's take a look at the actual butterflies.

4) If butterflies emerge based on temperature, what would we expect to be the relationship between Spring temperature and day of emergence? Summer temperature and day of emergence? Year and day? Test all three out below. Which predictor (day, spring temp, summer temp) you think is most relevant? (10 points)

ggplot(data = Butterfly_data, mapping = aes(x = SpringTemp, y = Day)) +geom_point()
ggplot(data = Butterfly_data, mapping = aes(x = SummerTemp, y = Day)) +geom_point()
ggplot(data = Butterfly_data, mapping = aes(x = Year, y = Day)) +geom_point()

#CJL: I think SpringTemp is the best predictor.

5) Is there a significant linear relationship between any predictors and the response? (10 pts)

model_fit_SpringTemp <- lm(Day ~ SpringTemp, data = Butterfly_data)
model_fit_SummerTemp <- lm(Day ~ SummerTemp, data = Butterfly_data)
model_fit_Year <- lm(Day ~ Year, data = Butterfly_data)

summary(model_fit_SpringTemp)
summary(model_fit_SummerTemp)
summary(model_fit_Year)

#CJL:
The adjusted R^2 values and p values for the predictors:
SpringTemp = 0.1717       , 1.45e-07
SummerTemp = 0.5313       , 0.003151
Year       = 0.01541      , 0.07406    

Though the p value for SpringTemp predictor is significant, the R^2 value is so low. I conclude that there is no significant relationship between any of these predictors and response.

6) It looks like different animals might have different relationships to the predictor variables. Try plotting them out as a facets (5 pts).

ggplot(data = Butterfly_data, mapping = aes(x = SpringTemp, y = Day)) +geom_point() + facet_wrap(facets = vars(ButterflySpecies)) 
ggplot(data = Butterfly_data, mapping = aes(x = SummerTemp, y = Day)) +geom_point() + facet_wrap(facets = vars(ButterflySpecies))
ggplot(data = Butterfly_data, mapping = aes(x = Year, y = Day)) +geom_point() + facet_wrap(facets = vars(ButterflySpecies))

7) Join our two datasets together and remake the facet plot above. Does this change your opinion of the relatedness of variables? (5 pts)

joined <- inner_join(Butterfly_data, Phenology_data, by = "Year")
ggplot(data = joined, mapping = aes(x = AnnualTemp, y = Day)) +geom_point() + facet_wrap(facets = vars(ButterflySpecies)) +stat_smooth(method = "lm")

#CJL: it did not change my opinion. The data shows no correlations between annual teemp and day of emergence. Except the Papilio zelicaon where when the temp was high, it seems that the day of emergence occured sooner than usual.

8) Wrap up: What is the relationship between temperature (remember that we looked at 3 kinds of temperature) and day of emergence? Is it the same for all species? How does this relate back to Kharouba and Vellend's hypotheses? (10 pts)

#CJL: Overall, the data suggests no significant correlation between temperature and the day of emergence in all species. This is different than Kharouba and Vellend's hypothesis where they found spring and summer temperatures had significant effect on the day of emergence.

Part Two. 30 points.

For each of your function files, add to the .R:

Expected function inputs. For example, if you will be plotting a histogram, do you want a data frame as input? A dataframe and the name of the column you want to plot?
Expected outputs. Will your function return a plot? A single number? A vector of numbers?