knitr::opts_chunk$set( eval = TRUE, echo = TRUE, collapse = TRUE, comment = "#>" ) options(fgeo.quiet = TRUE)
This article shows some of the key features of fgeo applied to an exploratory data analysis. For a deeper and general approach to exploratory data analysis, see this book section. A version adapted for ForestGEO is available here.
In every new R session you need to "open" fgeo with library()
.
library(fgeo)
You may suppress the start-up message with suppressPackageStartupMessages()
or add options("fgeo.quiet" = TRUE)
to .Rprofile (see usethis::edit_r_profile()
).
You may use you own data but fgeo comes with some example datasets.
fgeo_help("datasets")
We will start with a dataset of stems censused in one hectare from the forest plot in Luquillo, Puerto Rico (https://forestgeo.si.edu/sites/north-america/luquillo).
stem <- download_data("luquillo_stem_random") str(stem)
This dataset comes with multiple censuses but we will pick the latest one.
unique(stem$CensusID) stem <- filter(stem, CensusID %in% max(unique(stem$CensusID))) unique(stem$CensusID)
For a description of the columns, see ?data_dictionary
.
data_dictionary <- download_data("data_dictionary") str(data_dictionary) cols <- names(stem) filter(data_dictionary, column %in% cols)
Two columns that are commonly useful in ForestGEO datasets are status
and dbh
(diameter at breast height). We will begin by better understanding what type of variables they are. For this, base R provides useful functions.
status
is a categorical variable.
str(stem$status)
We can count the number of observations in each category with table()
, then visualize the result with barplot()
.
by_category <- table(stem$status) barplot(by_category)
dbh
is a continuous numeric variable.
str(stem$dbh)
(Note the missing values (NA
s).)
And we can visualize its distribution with hist()
.
hist(stem$dbh)
Unfortunately hist()
dropped missing values silently. But we can better understand how missing values of dbh
relate to status
by extracting only the columns dbh
and status
, and picking only the rows where dbh
is missing.
dbh_status <- stem[c("dbh", "status")] missing <- filter(dbh_status, is.na(dbh)) unique(missing)
Another approach is to count missing values.
missing <- transform(stem, na = ifelse(is.na(dbh), TRUE, FALSE)) table(missing$na, missing$status)
We learn that dbh
is missing where a tree is dead (status = D
) or gone (status = G
). This makes sense and, depending on the type of analysis we want to do, we may want to keep or remove missing values.
Now we are ready to clean the data. For example, we can pick trees which status is "A" (alive). At ForestGEO, working with status
is so common that fgeo provides a specialized function.
fgeo_help("status")
In stem
, the variable status
records the status of each individual stem. How can we determine the status of a tree based on the status of each of its stems? That is the job of add_status_tree()
.
stem <- add_status_tree(stem, status_a = "A", status_d = "D") alive_trees <- filter(stem, status_tree == "A") # Note that alive trees may have some missing, gone or dead stems some_cols <- c( "treeID", "status_tree", "stemID", "status") example_tree <- 46 example_rows <- filter(alive_trees, treeID == example_tree) select(example_rows, some_cols)
dbh
rangeAnother very common task when working with ForestGEO data is to pick stems of a particular dbh
range.
fgeo_help("dbh")
Pick stems of 10 mm or more.
ten_plus <- pick_dbh_min(alive_trees, 10) range(ten_plus$dbh, na.rm = TRUE)
Calculate the total abundance of stems and trees.
# Drop missing values of `dbh` non_missing <- filter(ten_plus, !is.na(dbh)) # Stem abundance abundance(non_missing) # Tree abundance (picking main stems -- with highest `hom` and largest `dbh`) main_stems <- pick_main_stem(non_missing) abundance(main_stems)
Calculate the abundance of trees by species.
by_sp <- group_by(main_stems, sp) n_by_sp <- abundance(by_sp) n_by_sp
What are the three most abundant tree species?
(top3_species <- arrange(n_by_sp, desc(n))[["sp"]][1:3])
Now we can pick the alive_trees
of only the top3
species.
picked_stems <- filter(alive_trees, sp %in% top3_species)
fgeo includes some functions specialized in mapping ForestGEO's data.
fgeo_help("map")
Map the most abundant species.
example_elevation <- fgeo.x::elevation species_elevation <- sp_elev(picked_stems, example_elevation) autoplot(species_elevation)
Tweak to focus on the hectare available in the data.
autoplot(species_elevation, xlim = c(100, 200), ylim = c(400, 500))
In this section we will use two other datasets.
census5 <- download_data("luquillo_tree5_random") census6 <- download_data("luquillo_tree6_random") recruitment_ctfs(census5, census6)
With as_tibble()
we convert the result of any demography function to a tibble -- a convenient dataframe.
as_tibble( recruitment_ctfs(census5, census6) )
We can aggregate results by any number of variables:
split1
.as_tibble(recruitment_ctfs(census5, census6, split1 = census5$sp))
interaction()
(but aggregating by more than one variable may be slow and confusing).sp_quadrat <- interaction(census5$sp, census5$quadrat) by_many <- recruitment_ctfs(census5, census6, split1 = sp_quadrat) as_tibble(by_many)
To separate the multiple groups we can use tidyr::separate()
.
tidyr::separate(as_tibble(by_many), groups, into = c("species", "quadrats"))
The same works for mortality and growth.
as_tibble(mortality_ctfs(census5, census6, split1 = sp_quadrat)) as_tibble(growth_ctfs(census5, census6, split1 = sp_quadrat))
In this section we will use a tree and elevation data, and we will create habitat data from the elevation data.
tree5 <- download_data("luquillo_tree5_random") elevation <- download_data("luquillo_elevation") # Pick alive trees, of 10 mm or more census <- filter(tree5, status == "A", dbh >= 10) # Pick sufficiently abundant species pick <- filter(add_count(census, sp), n > 50) # Use your habitat data or create it from elevation data habitat <- fgeo_habitat(elevation, gridsize = 20, n = 4)
We have everything we need. We can now calculate species-habitat associations with tt_test()
.
# A list or matrices tt_test_result <- tt_test(pick, habitat) tt_test_result
Finally, here are some ways you may visualize the results.
# A simple summary to help you interpret the results summary(tt_test_result) # A combined matrix Reduce(rbind, tt_test_result) # A dataframe as_tibble(tt_test_result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.