Homework: Heterogeneity {#hw-heterogeneity}

Required Reading

This homework will prepare you to analyze the data from the Heterogeneity lab. You should read over that the material to properly understand the data you'll be working with here. I will assume that you are familiar with the material used in the Succession homework (which references Chapters \@ref(r-intro), \@ref(r-ggplot), and \@ref(r-dplyr), and Sections \@ref(r-stats-factors) and \@ref(r-stats-cat)). In addition, you should read Section \@ref(r-stats-lm).

Objectives (and what's due)

You'll also be doing to other things that aren't specifically hypothesis or question driven, but should be presented in the results.

Getting started

You should download two datasets from Canvas and put them in your data folder:

Begin your script with the following:

# Heterogeneity Lab Analysis
#### Setup ####
library(tidyverse)
theme_set(theme_classic()) # Removes gridlines from ggplot

# Data file locations: Change these when analyzing the current dataset
current_data <- read_csv("data/F20_heterogeneity_data.csv")

historical_data_full <- read_csv("data/historical_data_Fall_03_to_19.csv")

Save the R file as R/heterogeneity_homework.R.

The data

The columns of these data frames that are relevant are:

Relationships between Cover Levels

Using the current data, create contingency tables for each pairwise combination of Canopy, Shrub, and Ground (three tables total); for each of these, run chi-squared tests (or Fisher's tests if necessary). Instead of including the contingency table directly, we're going to create figures that show colored heat maps.

your_contingency_table |> # replace with your contingency table name
  as_tibble() |> # Convert to a data frame
  ggplot()  + 
  aes(x = , y = ) + # Finish these
  geom_tile(aes(fill = n)) + # Makes colored squares based on count
  scale_fill_viridis_c() + # Nice color scale
  coord_fixed() + # keeps x & y dimensions equal
  # Add text to the squares
  geom_label(aes(label = round(n, 3)), # Include the count as a label; the rounding is there in case you want to switch it to a proportion
            label.size = 0, # Keep this as is
            # Feel free to adjust these details
            # to get an image you like
            color = "white", # Text color
            fill = "#00000010", # Slight background around text to make it more legible
            size = 10) # text size; 

To look at the interaction of all three levels, create four data sets for each different shrub level using the filter() function and run separate Chi-squared / Fisher's exact tests. To visualize the table, you'll need to make a 3-dimensional contingency table and manipulate it:

contingency_3d <- current_data |> 
  select(Canopy, Shrub, Ground) |> 
  table() |> 
  as_tibble() |> 
  mutate(Shrub = paste("Shrub = ", Shrub)) # Modify shrub to include the label; this will help with faceting

Then use this with the figure-making code, but use facet_wrap() (Section \@ref(r-ggplot-facet)) to make separate panels for each shrub level. Your final figure for this should look like this:

contingency_3d |> 
  ggplot(aes(x = Canopy, y = Ground)) + 
  geom_tile(aes(fill = n)) + # Makes colored squares based on count
  scale_fill_viridis_c() + # Nice color scale
  facet_wrap(~Shrub, nrow = 2) + 
  coord_fixed() + # keeps x & y dimensions equal
  # Add text to the squares
  geom_label(aes(label = round(n, 3)), # Include the count as a label; the rounding is there in case you want to switch it to a proportion
            label.size = 0, # Keep this as is
            # Feel free to adjust these details
            # to get an image you like
            color = "white", # Text color
            fill = "#00000010", # Slight background around text to make it more legible
            size = 10) # text size; 

Relationship Between Current and Historical Cover Levels

You'll need to make a dataset that contains the current data and the semester of interest (which is Fall 2004). First, filter historical_data_full so that it only includes Fall 2004's data (save as historical_data_F04). Then, you'll want to modify the current_data so that it has columns for the Semester and Year (Fall 2020) before binding it to the 2004 data:

historical_comparison_data <- current_data |> 
  mutate(   ) |>  # Fill this in; new columns should match the columns in historical_data_full
  bind_rows(historical_data_F04) # Combine w/ F04 data

From here, you'll need to make 3 more contingency tables (Year vs. each of the cover levels), run the appropriate tests on them, and visualize them.

Calibrating Subjective Scores with Gap Light Analyzer

The cover levels here are subjectively determined; it's probably a good idea to see how well the subjective scores match with a more objective measure. We'll do this with a linear regression (Section \@ref(r-stats-lm)). To see how well subjective scores predict the GLA percent canopy openness, run a regression with Canopy as the predictor (X) and Percent_Cnpy_Open as the response (Y). When reporting these results, focus on the $R^2$ statistic, as it indicates the amount of variation in canopy openness is explained by the subjective scores.

Create a figure of the regression with both a regression line and individual data points (See \@ref(r-ggplot-contx-conty) for examples). For the data points, use geom_jitter(height = 0, width = 0.15) instead of geom_point(), so that all of the points aren't lined up with each other.



Christopher-Peterson/Bio373L-Book documentation built on Oct. 26, 2022, 8:36 a.m.