This homework will prepare you to analyze the data from the Heterogeneity lab. You should read over that the material to properly understand the data you'll be working with here. I will assume that you are familiar with the material used in the Succession homework (which references Chapters \@ref(r-intro), \@ref(r-ggplot), and \@ref(r-dplyr), and Sections \@ref(r-stats-factors) and \@ref(r-stats-cat)). In addition, you should read Section \@ref(r-stats-lm).
What are the relationships between canopy, shrub, and ground cover?
How do the relative abundances of Canopy, Shrub, and Ground cover compare with historical observations?
You'll also be doing to other things that aren't specifically hypothesis or question driven, but should be presented in the results.
Calibrating your subjective estimates of canopy estimates with the Gap Light Analyzer calculations. This in particular isn't a biological hypothesis, but comparing different methods for addressing the same question is a common feature of biological research.
You should download two datasets from Canvas and put them in your data
folder:
Begin your script with the following:
# Heterogeneity Lab Analysis #### Setup #### library(tidyverse) theme_set(theme_classic()) # Removes gridlines from ggplot # Data file locations: Change these when analyzing the current dataset current_data <- read_csv("data/F20_heterogeneity_data.csv") historical_data_full <- read_csv("data/historical_data_Fall_03_to_19.csv")
Save the R file as R/heterogeneity_homework.R
.
The columns of these data frames that are relevant are:
Using the current data, create contingency tables for each pairwise combination of Canopy, Shrub, and Ground (three tables total); for each of these, run chi-squared tests (or Fisher's tests if necessary). Instead of including the contingency table directly, we're going to create figures that show colored heat maps.
your_contingency_table |> # replace with your contingency table name as_tibble() |> # Convert to a data frame ggplot() + aes(x = , y = ) + # Finish these geom_tile(aes(fill = n)) + # Makes colored squares based on count scale_fill_viridis_c() + # Nice color scale coord_fixed() + # keeps x & y dimensions equal # Add text to the squares geom_label(aes(label = round(n, 3)), # Include the count as a label; the rounding is there in case you want to switch it to a proportion label.size = 0, # Keep this as is # Feel free to adjust these details # to get an image you like color = "white", # Text color fill = "#00000010", # Slight background around text to make it more legible size = 10) # text size;
To look at the interaction of all three levels, create four data sets for each different shrub level using the filter()
function and run separate Chi-squared / Fisher's exact tests.
To visualize the table, you'll need to make a 3-dimensional contingency table and manipulate it:
contingency_3d <- current_data |> select(Canopy, Shrub, Ground) |> table() |> as_tibble() |> mutate(Shrub = paste("Shrub = ", Shrub)) # Modify shrub to include the label; this will help with faceting
Then use this with the figure-making code, but use facet_wrap()
(Section \@ref(r-ggplot-facet)) to make separate panels for each shrub level.
Your final figure for this should look like this:
contingency_3d |> ggplot(aes(x = Canopy, y = Ground)) + geom_tile(aes(fill = n)) + # Makes colored squares based on count scale_fill_viridis_c() + # Nice color scale facet_wrap(~Shrub, nrow = 2) + coord_fixed() + # keeps x & y dimensions equal # Add text to the squares geom_label(aes(label = round(n, 3)), # Include the count as a label; the rounding is there in case you want to switch it to a proportion label.size = 0, # Keep this as is # Feel free to adjust these details # to get an image you like color = "white", # Text color fill = "#00000010", # Slight background around text to make it more legible size = 10) # text size;
You'll need to make a dataset that contains the current data and the semester of interest (which is Fall 2004).
First, filter historical_data_full
so that it only includes Fall 2004's data (save as historical_data_F04
).
Then, you'll want to modify the current_data
so that it has columns for the Semester and Year (Fall 2020) before binding it to the 2004 data:
historical_comparison_data <- current_data |> mutate( ) |> # Fill this in; new columns should match the columns in historical_data_full bind_rows(historical_data_F04) # Combine w/ F04 data
From here, you'll need to make 3 more contingency tables (Year vs. each of the cover levels), run the appropriate tests on them, and visualize them.
The cover levels here are subjectively determined; it's probably a good idea to see how well the subjective scores match with a more objective measure.
We'll do this with a linear regression (Section \@ref(r-stats-lm)).
To see how well subjective scores predict the GLA percent canopy openness, run a regression with Canopy
as the predictor (X) and Percent_Cnpy_Open
as the response (Y).
When reporting these results, focus on the $R^2$ statistic, as it indicates the amount of variation in canopy openness is explained by the subjective scores.
Create a figure of the regression with both a regression line and individual data points (See \@ref(r-ggplot-contx-conty) for examples).
For the data points, use geom_jitter(height = 0, width = 0.15)
instead of geom_point()
, so that all of the points aren't lined up with each other.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.