This homework will prepare you to analyze the data from the succession lab. You should read over that the material to properly understand the data you'll be working with here. Information on the data-analysis techniques you'll need to use can be found in Chapters \@ref(r-intro), \@ref(r-ggplot), and \@ref(r-dplyr), as well as Sections \@ref(r-stats-factors) and \@ref(r-stats-cat).
Please attempt this to the best of your ability. Once you've read the above chapters, I will be happy to meet with you (individually or in groups) to provide guidance and support on this material.
The analysis for this lab should answer:
These should be submitted as a Word document or PDF. The figures and tables you create should follow the guides in Chapter \@ref(figures).
min_r_version = "4.0" min_rstudio_version = "1.4.1717-3"
As this is the first stats homework, let's make sure you're using RStudio correctly.
Check R Version: Open RStudio.
In the Console, your R version should be listed.
If it is less that r min_r_version
, you need to re-install R.
Download R here.
Check RStudio Version: Go to Help -> About RStudio.
If the version listed is lower than r min_rstudio_version
, then you need to re-install it.
Note that very recent versions are numbered with a date (e.g., v. 2021.9.1.372); these are good.
Dowload RStudio Desktop here.
Check package versions: In the console, run library(tidyverse)
.
If this fails, run install.packages("tidyverse")
, then start this over again.
Tidyverse is a collection of packages that are designed to work well together.
You should see the header "Attaching packages", followed by a list of package names and version numbers.
Make sure that the following packages are at least this high:
ggplot2
: 3.3.3tibble
: 3.1.0tidyr
: 1.1.3readr
: 2.0.0purrr
: 0.3.4dplyr
: 1.0.5stringr
: 1.4.0forcats
: 0.5.1You should also click the packages tab in one of the RStudio panels and verify that you have the following:
-cowplot
-vegan
-ggforce
If any of these are missing or out of date, use install.packages(c("package1", "package2", "etc"))
to get them.
In the upper-right corner of the RStudio window, there should be a box that says Project: (None). Click this, then on the drop-down menu select New Project...
If you already have a folder on your computer for Bio 373L, select Existing Directory, then hit Browse and navigate to that folder. Otherwise, select New Directory -> New Project, and name the directory "Field Ecology" or "Bio 373L" or whatever; place it as a subdirectory wherever you normally keep your class files. Click Create Project.
RStudio will refresh, and it should now say BIO 373L or whatever you named the project in the upper right corner. Whenever you open RStudio for this class, make sure the project name is there. If it isn't, click that box and you should be able to select the project from the drop-down menu.
Click on the Files tab (in one of the RStudio panels). This will give you a list of files & folders that are part of the RStudio project. From here, use the New Folder button to create the following subdirectories:
data
: for raw data filesR
: for saving script filesoutput
: for saving the results of your analysesfigures
: for saving graphsOrganizing your project like this is a good way to keep your files tidy and easy to find.
Create a new R script with Ctrl + Shift + N or File -> New File -> R Script. This will open up the script panel.
Start your script with this code:
# Succession Lab Analysis Homework ## Setup #### # Load Required Packages library(tidyverse) library(cowplot)
# This invisible block is a workaround for Travis CI not having the tidyverse meta-package; library(ggplot2) library(dplyr) library(readr) library(tibble) library(forcats) library(cowplot) succession_data = read_csv("example_data/succession_long_F19.csv") # Capture View() real_view = View View = function(x,...){ # Scrollable table output # knitr::kable(head(x)) x |> head(n = 10) |> kableExtra::kbl() |> kableExtra::kable_paper() |> kableExtra::scroll_box(width = "100%") }
Put your cursor on each line and hit Ctrl + Enter
(or Cmd + Return
on a Mac).
This will send that line to the console and run it.
Entering your commands in a script makes it a lot easier to see what you've done, repeat it, or modify it.
You should only enter code in the console if you don't want a record of it (which should be an unusual circumstance).
Save your script as R/hw_succession.R
For the homework, we'll use a dataset in Fall of 2019.
Download succession_long_F19.csv and save it in the data
folder.
To read and view the data, add the following lines to your script and run them:
succession_data <- read_csv("data/succession_long_F19.csv") View(succession_data)
Note that I used read_csv()
, not read.csv()
.
Always use the version with the underscore; it's part of the readr
package (which is part of tidyverse
) and is generally faster and more consistent than read.csv()
.
RStudio also has a data import tool.
Do not use it.
You should import your data with code so that you can go back later and see exactly what you've done.
Your data should look something like this:
View(succession_data)
This data is in a tidy format: each row is an observation (a single recorded tree), each column is a separate variable. These are the columns that are relevant to the analysis:
The remaining columns (Team, Quadrant, Location, Location_type, Location_angle, DBI, Drought_stress, N_dead) are not relevant to this analysis and can be ignored.
Lets trim this data by removing the unnecessary columns.
Use a select()
function to make a copy of the data that doesn't have the unnecessary columns. Save the resulting data frame as succession_data_thin
.
(See section \@ref(r-dplyr-select); section \@ref(r-dplyr-pipe) is also helpful).
You'll need to filter()
your data (see \@ref(r-dplyr-filter)) so that it only contains a single habitat when doing these comparisons.
Let's start with the old quarry; once you get everything working, you should copy and modify the code to work for the other two habitats.
You should also read up on the pipe ( |> )
, as it is important for making sequences of steps (\@ref(r-dplyr-pipe)).
## Relative Abundance - Q #### data_Q <- succession_data_thin |> filter(Habitat == "Q")
Comment lines ending with ####
are a good way to organize different sections of your code. RStudio will allow you to collapse the code underneath it (until the next section), so you can navigate your document more easily.
For each habitat, you'll need to create a bar plot comparing the relative abundances of canopy and sapling species. It should look something like this:
# Invisible to students: create results data_Q = succession_data |> filter(Habitat == "Q") rel_ab_Q = data_Q |> group_by(Tree_type, Species) |> count(name = 'count') |> group_by(Tree_type) |> mutate(relative_abundance = count / sum(count)) |> arrange(Tree_type, desc(relative_abundance)) |> mutate(Species = fct_inorder(Species)) rel_ab_Q |> ggplot(aes(x = Species, y = relative_abundance, fill = Tree_type)) + geom_col(position = "dodge") + theme_classic() + # Remove gridlines scale_fill_viridis_d("Tree Age Class") + # changes the colors of the bars; xlab("Species") + # ylab("Relative Abundance")+ theme(axis.text.x = element_text(angle = -25, hjust = .7, vjust = 0))
First, you should create a data frame that has columns Species, tree_type (canopy/sapling), and count (the number trees of that species/type).
You can create this from data_Q
using a combination of summarise()
(\@ref(r-dplyr-summarize)) and group_by()
(\@ref(r-dplyr-group)).
Your code should look something like this:
# Get the number of sapling and canopy trees of each species for the Quarry habitat canopy_sapling_counts_Q <- data_Q |> group_by(Tree_type, Species) |> summarize(count = n()) # n() gets the number of rows within each group
Next, we need to calculate relative abundances of each species for canopy and sapling trees.
You can do this by dividing count
by the total count within the tree-type category.
The best way to do this is by combining group_by()
with mutate()
(\@ref(r-dplyr-mutate)).
Within the mutate command, you'll want to use the sum()
function.
# Calculate relative abundance rel_abundance_Q <- canopy_sapling_counts_Q |> group_by() |> # Fill this out mutate(relative_abundance = ) # fill this out # To verify you did it correctly, run this: rel_abundance_Q |> summarise(total = sum(relative_abundance)) # You should get a total of 1 for each tree type
One last detail: by default, categorical data (like species identities) are ordered alphabetically along the x axis by default.
It is better to change this ordering into something informative.
For this figure, we will sort the data by decreasing canopy abundance, then convert Species
into a factor (discussed in section \@ref(r-stats-factor)).
To do this, you'll need to use arrange()
, mutate()
and fct_inorder()
.
# Sort species by relative canopy abundance plot_data_Q <- rel_abundance_Q |> arrange( , ) |> # You'll need to arrange by TWO columns; mutate(Species = fct_inorder(Species)) # Convert species into a factor based on its current sorting
View your final plot data; you should have a data frame with canopy trees on top, and decreasing relative abundances.
We'll be making a bar graph using ggplot
; I'd recommend reading all of Chapter \@ref(r-ggplot), with particular focus on Section \@ref(r-ggplot-discx-discy).
plot_rel_abund_Q <- ggplot(plot_data_Q) + aes() + # You need to assign x, y, and fill aesthetics to different columns in plot_data_Q geom_col(position = "dodge") + theme_classic() + # Remove gridlines scale_fill_viridis_d("Tree Age Class") + # changes the colors of the bars; xlab("Name Your X Axis") + # ylab("Name Your Y Axis") + # You need to adjust these theme(axis.text.x = element_text(angle = -25, hjust = .7, vjust = 0)) # Adjust the x axis text angle plot_rel_abund_Q # run this to view the plot
See the chapter for details of how these arguments work (you can also see the ggplot2 website for more information).
I would recommend changing the scale_fill_
to a different option so you can customize your colors (\@ref(r-ggplot-aes-color)).
To save your plot as an image do this:
ggsave("figures/succession_rel_abundance_Q.png", # File name plot_rel_abund_Q, # plot dpi = 300, # keeps a high resolution; don't change this width = 7, height = 5 # Width & height in inches; feel free to change as needed )
Be sure to inspect the saved image to ensure that it looks right.
We want to take the five most abundant species in each habitat and create a contingency table.
First, let's find the five most abundant species.
Create a data frame with column Species
and total_count
using a grouped summarize()
operation, then sort it by decreasing count with arrange()
.
You'll probably want to start with data_Q
.
Save the result as total_counts_Q
.
To get the five most common species, you can extract the Species
column and subset it.
top_five_spp_Q <- total_counts_Q$Species[1:5]
Filter data_Q
so that it only includes species in the top 5, using the %in%
operator, then use select()
and table()
to create a contingency table (\@ref(r-stats-cont-tbl)).
Save the result as contingency_tbl_Q
.
Finally, you'll need to run a chi-squared test or Fisher's exact test (\@ref(r-stats-chisq)) on contingency_tbl_Q
. Use the chi-squared test if all of the cells of the table have a count of 5 or more; otherwise, go with the Fisher's test.
Don't forget to do all of this for the other two habitat types. Copy the code you've developed above and modify it.
Density (according to the point-quarter system) is a property of each point (in this case, each point_number
) and is only relevant for canopy trees.
First, filter succession_data_thin
so that it only contains canopy trees (save it as canopy_data
).
Now, we have 2 to 4 rows for each point, and we want to get one value per point (the density).
This suggests that a (grouped) summarize operation is what we need.
As defined in Section \@ref(lab-succession-density), the point-quarter density estimate is:
$$\frac{1}{\text{mean} (x^2)}$$
where $x$ is Distance
.
Use this template to calculate density
density_data <- canopy_data |> group_by( , ) |> # you'll need two grouping factors summarize(density = ) # Note that ^2 is how you square something
Finally, let's make a figure showing the distribution of density values at each habitat (any of the Discrete X, Continuous Y plot styles in Section \@ref(r-ggplot-discx-conty) will work).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.