knitr::opts_chunk$set( collapse = TRUE, comment = "#>", rmarkdown.html_vignette.check_title = FALSE )
library(plantTracker) library(ggplot2) library(sf) # library(devtools) # load_all()
Welcome to plantTracker
! This package was designed to transform long-term
quadrat maps that show plant occurrence and size into demographic data that can
be used to answer questions about population and community ecology.
This vignette is designed to help you use plantTracker
functions to move from a
spatially referenced dataset containing plant cover data, to an output dataset
that contains growth and survival data for each observed individual.
This vignette will walk you through the steps to get from maps of plant cover to a demographic dataset that you can use for analysis!
The functions in plantTracker
require data in a specific format. plantTracker
includes an example dataset that consists of two pieces: grasslandData
and
grasslandInventory
. You can load these example datasets into your global
environment by calling data(grasslandData)
and data(grasslandInventory)
. You
can view the documentation for these datasets by calling
?grasslandData
and ?grasslandInventory
.
Most plantTracker
functions require two data objects. The first is a data frame
that contains the location and metadata for each mapped individual, which we
from now on will call dat
. The second is a list that contains a vector of
years in which each quadrat was sampled, which we from now on will cal inv
.
Below are the basic requirements for these data objects.
dat
data frame must . . .sf
data.frame. More on this below in section 1.1.1...sf
package data
format) that contains a polygon representing the location of each observation.
Each observation must be a POLYGON
or MULTIPOLYGON
. Data cannot be stored as
POINTS
.dat
does not need to have a coordinate reference system (i.e. CRS can
be "NA"), but it can have one if you'd like. plantTracker
functions. plantTracker
functions. Here are the first few rows of a possible dat
input data.frame:
dataShow <- head(grasslandData[, !names(grasslandData) %in% c("Clone", "Seedling", "Stems", "Basal", "sp_code_4", "Area")]) rownames(dataShow) <- NULL (dataShow) #knitr::kable(dataShow, caption = "**Table 1.1**: Example `dat` data.frame" )
plantTracker
functions. Here's what some of the example dat
data (from the "SG2" quadrat at the
"AZ" site in 1922) look like when plotted spatially:
exampleDat <- grasslandData[grasslandData$Site == "AZ" & grasslandData$Quad == "SG2" & grasslandData$Year == 1922, ] dat <- grasslandData[grasslandData$Site == "AZ" & grasslandData$Quad == "SG2",] ggplot(data = exampleDat) + geom_sf(aes(color = Species, fill = Species)) + geom_segment(aes(x = 0, xend = 1, y = 0, yend = 0), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = 0, xend = 1, y = 1, yend = 1), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = 0, xend = 0, y = 0, yend = 1), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = 1, xend = 1, y = 0, yend = 1), size = .5, lineend = "round", color = "grey30") + xlab("quadrat horizontal edge (m)") + ylab("quadrat vertical edge (m)") + theme_classic() + theme(axis.line = element_blank(), legend.text = element_text(face = "italic"), plot.margin = margin(1,0,1,0))
It's important to note that, while plantTracker
was designed to be used with
small-scale maps of plant occurrence in quadrats, it is possible to
use other styles of map data in plantTracker
functions. All that is required is
a single mapped basal area (or point location converted to a small polygon) at
each time point for each organism (or ramet), and is accompanied by the required
metadata detailed above. For example, plantTracker
functions could be used to
estimate tree demographic rates at the scale of 100 m x 50 m plots.
sf
data formatAs mentioned above, plantTracker
uses the sf
R package to deal with spatial
data. The map data that plantTracker
was built to analyze is inherently spatial,
so you need to know how to the basics of dealing with spatial data in R if you
want to use plantTracker
! There are many good resources to help you orient
yourself to working with spatial data in R generally:
And the sf
package more specifically:
These resources provide a great orientation, and while I recommend looking over
them if you're new to working with spatial data in R, I've included a brief
tutorial for uploading shapefiles into R as sf
data frames.
Most of the published chart-quadrat datasets have the map data stored as
shapefiles in complex file structures, which can be a bit confusing to navigate.
plantTracker
requires all of your data (for all species, plots and years) to be
in one single data frame. This example shows how you might navigate through a
complex file structure to to pull out shapefiles and put them into one single
sf
data frame. for further analysis with plantTracker
.
For this example, I'll use a subset of the data from the Santa Rita Experimental
Range in Arizona, which has been published in this data paper. In this dataset, shapefiles for each quadrat are stored in their own folder.
Within that folder there are two shapefiles for each year: one that contains map
data for polygons, and one that contains data for points. The following code
reads in those shapefiles, transforms the points to polygons of a fixed radius,
and puts all the data into one sf
data frame. If you want to follow along,
download the "shapefiles.zip" file from the data paper, un-zip it, and name it "AZ_shapefiles". The dataset that is the result of this example is the same as
part of the "grasslandData" dataset included in plantTracker
.
# get the working directory name wdName <- c("/Users/Alice/Dropbox/Grad School/Research/Trait Project/Data/ChartQuadDatasets/AZs_downloaded data/")
# save a character vector of the file names in the file that contains the # shapefiles (in this case, called "CO_shapefiles"), each of which is a quadrat # note: 'wdName' is a character string indicating the path of the directory # containing the 'AZ_shapefiles' folder quadNames <- list.files(paste0(wdName,"AZ_shapefiles/")) # trim the quadrats down to 2, for the sake of runtime in this example quadNames <- quadNames[quadNames %in% c("SG2", "SG4")] # now we'll loop through the quadrat folders to download the data for (i in 1:2){#length(quadNames)) { # get the names of the quadrat for this iteration of the loop quadNow <- quadNames[i] # get a character vector of the unique quad/Year combinations of data in # this folder that contain polygon data quadYears <- quadYears <- unlist(strsplit(list.files( paste0(wdName, "AZ_shapefiles/",quadNow,"/"), pattern = ".shp$"), split = ".shp")) # loop through each of the years in this quadrat for (j in 1:length(quadYears)) { # save the name of this quadYear combo quadYearNow <- quadYears[j] # read in the shapefile for this quad/year combo as an sf data frame # using the 'st_read()' function from the sf package shapeNow <- sf::st_read(dsn = paste0(wdName,"AZ_shapefiles/",quadNow), # the 'dsn' argument is the folder that # contains the shapefile files--in this case, # the folder for this quadrat layer = quadYearNow) # the 'layer' argument has the # name of the shapefile, without the filetype extension! This is because each # shapefile consists of at least three separate files, each of which has a # unique filetype extension. # the shapefiles in this dataset do not have all of the metadata we # need, and have some we don't need, so we'll remove what we don't need and # add columns for 'site', 'quad', and 'year' shapeNow$Site <- "AZs" shapeNow$Quad <- quadNow # get the Year for the shapefile name--in this case it is the last for # numbers of the name shapeNow$Year <- as.numeric(strsplit(quadYearNow, split = "_")[[1]][2]) + 1900 # determine if the 'current' quad/year contains point data or polygon data if (grepl(quadYearNow, pattern = "C")) { # if quadYearNow has point data # remove the columns we don't need shapeNow <- shapeNow[,!(names(shapeNow) %in% c("Clone", "Seedling", "Area", "Length", "X", "Y"))] # reformat the point into a a very small polygon # (a circle w/ a radius of .003 m) shapeNow <- sf::st_buffer(x = shapeNow, dist = .003) # add a column indicating that this observation was originally # mapped as a point shapeNow$type <- "point" } else { # if quadYearNow has polygon data # remove the columns we don't need shapeNow <- shapeNow[,!(names(shapeNow) %in% c("Seedling", "Canopy_cov", "X", "Y", "area"))] # add a column indicating that this observation was originally # mapped as a polygon shapeNow$type <- "polygon" } # now we'll save this sf data frame if (i == 1 & j == 1) { # if this is the first year in the first quadrat dat <- shapeNow } else { # if this isn't the first year in the first quadrat, simply rbind # the shapeNow sf data frame onto the previous data dat <- rbind(dat, shapeNow) } } } # Now, all of the spatial data are in one sf data frame! # for the sake of this example, we'll remove data for some species and years in order to make the example run faster (and to make this 'dat' data.frame identical to the "grasslandData" dataset included in this R pakcage). dat <- dat[dat$Species %in% c("Heteropogon contortus", "Bouteloua rothrockii", "Ambrosia artemisiifolia", "Calliandra eriophylla", "Bouteloua gracilis", "Hesperostipa comata", "Sphaeralcea coccinea", "Allium textile"),] dat <- dat[ (dat$Quad %in% c("SG2", "SG4") & dat$Year %in% c(1922:1927)),]
In some spatial datasets, observations that were measured as "points" in the field are still stored as "points" in the shapefiles. plantTracker
requires all observations to be stored as "polygon" geometry in order to streamline functions, so we need to translate "points" into small polygons of a fixed area. In this case, we'll transform them into circles with a radius of 1 cm (.01, since this dataset measures area in meters). 'dat' has a column called "type." A value of "point" in this column will tell us that, even though the geometry of the "point" data is now in "polygon" format, the values for basal area and growth are not indicative of the true size of the plant.
# We use the function "st_buffer()" to add a buffer of our chosen radius (.01) around each point observation, which will transform each observation into a circle of the "polygon" format with a radius of .01. dat_1 <- st_buffer(x = dat[st_is(x = dat, type = "POINT"),], dist = .01) dat_2 <- dat[!st_is(x = dat, type = "POINT"),] dat <- rbind(dat_1, dat_2)
If you don't want to download the data and format it into an sf data.frame, you can also use a subset of the "grasslandData" data object stored in this R package. You will just need to subset it to include only the data from the "AZ" site. Code to do this is below:
dat <- grasslandData[grasslandData$Site == "AZ",]
inv
list must . . .dat
. There cannot be two elements with the same name, and
there cannot be an element with more than one quadrat in its name. There must be
an element for each quadrat with data in dat
. dat
(i.e. if year is a four-digit number in dat
, then
it must be a four-digit number in inv
). Make sure this is the years the
quadrat was actually sampled, not just the years that have data in the dat
data frame! This argument allows the function to differentiate between years when
the quadrat wasn't sampled and years when there just weren't any individuals of
a species present in that quadrat. If a quadrat wasn't sampled in a given year,
don't put an 'NA' in inv
for that year! Instead, just skip that year. Here is an example of an inv
argument that corresponds to the example dat
argument above. The quadrats that have data in dat
are "SG2" and "SG4", so
there are elements in inv
that correspond to each of these quadrats.
(inv <- grasslandInventory[c( "SG2", "SG4")])
If you already have a quadrat inventory as a data frame, it isn't complicated to
reformat it to work with plantTracker
functions.
For example, if your quadrat inventory data frame looks like this... :
data.frame("quad1" = c(2000, 2001, NA, 2003, 2004, 2005, 2006, 2007), "quad2" = c(2000:2007), "quad3" = c(2000, NA, 2002, 2003, 2004, 2005, 2006, 2007))
... then do the following to get it into a format ready for plantTracker
:
quadInv_DF <- data.frame("quad1" = c(2000, 2001, NA, 2003, 2004, 2005, 2006, 2007), "quad2" = c(2000:2007), "quad3" = c(2000, NA, 2002, 2003, 2004, 2005, 2006, 2007)) # use the 'as.list()' function to transform your data frame into a named list quadInv_list <- as.list(quadInv_DF) # we still need to remove the 'NA' values, which we can do using the # 'lapply()' function (quadInv_list <- lapply(X = quadInv_list, FUN = function(x) x[is.na(x) == FALSE]))
inv
and dat
arguments using checkDat()
The generic checkDat()
function:
checkDat(dat, inv = NULL, species = "Species", site = "Site", quad = "Quad", year = "Year", geometry = "geometry", reformatDat = FALSE, ...)
This step is optional, but can be useful if you're unsure whether your dat
and
inv
arguments are in the correct format. The plantTracker
function
checkDat()
takes dat
and inv
as arguments for the arguments dat
and
inv
, and will return informative error messages if either argument is not in
the correct format.
Additional optional arguments to checkDat()
are species
, site
, quad
,
year
, geometry
, and reformatDat
.
species/site/quad/year/geometry
These arguments only need to be included if
the columns in dat
that contain the data for species, site,
quadrat, year and geometry of each observation are different from the names
"Species", "Site", "Quad", "Year, and "geometry". For example, if the column in
your version of dat
that contains the species identity of each observation is
called "species_names", then the argument species = "species_names"
must be
included in your call to checkDat()
. reformatDat
is a TRUE/FALSE argument that determines whether you want the
checkDat()
function to return a version of dat
that is ready to go into the
steps of this workflow. If reformatDat
= TRUE then checkDat()
will return a
list that contains the reformatted version of dat
, the reformatted version of
inv
and an additional element called "userColNames", which contains the column
names in the input version of dat
that are different from the expected column
names of "Species", "Site", "Quad", "Year, and "geometry" (if there are any). If
reformatDat
= TRUE, then checkDat()
will return a message indicating that
your data is ready for the next step. The default value is FALSE.trackSpp()
Now it's time to transform your raw dataset into demographic data! This is
accomplished using the trackSpp()
function. This function follows individual
plants from year to year in the same quadrat to determine survival, size in the
next year, age, and some additional potentially-useful demographic data. It does
this by comparing quadrat maps from sequential years. If there is overlap of
individuals of the same species in consecutive years, then the rows in dat
that contain data for those overlapping individuals are given the same
"trackID", or unique identifier.
Here is the generic trackSpp()
function:
trackSpp(dat, inv, dorm, buff, buffGenet, clonal, species = "Species", site = "Site", quad = "Quad", year = "Year", geometry = "geometry", aggByGenet = TRUE, printMessages = TRUE, flagSuspects = FALSE, shrink = 0.1, dormSize = 0.05, ...)
trackSpp()
takes the following arguments:
dat
This is the sf
data frame that we've been calling dat
so far. This
must be in the correct format (which you can check before-hand using
checkDat()
), but informative error messages will be returned if it is
incorrect. It must have the columns outlined in Section 1.1, but they can have
different names as long as those names are included in this function call
(more on that later...).inv
This is the list of quadrat sampling years we've been calling inv
. If
it is not in the correct format or does not contain data for the correct
quadrats, then an informative error message will be returned. dorm
This is a positive integer value that indicates how long you want the
function to allow an individual to be "dormant". In this case, dormancy can be
interpreted as the biological phenomenon where a plant has above-ground tissue
present in year 1, is alive underground but with no above-ground tissue in year
2, and then has above-ground tissue in a subsequent year. Dormancy can also be
interpreted here as data-collection error, whereby an individual is accidentally
not mapped in between years where it was recorded. Consider the following example: There is a polygon of species "A" in year 1,
which is our "focal individual". In year 2, there is not a polygon of species
"A" that overlaps with our focal individual. In year 3, there is a polygon of
species "A" that is in the same location as our focal individual. If dorm = 0
,
then our focal individual would get a 0 in the survival column, and the polygon
of species "A" in year 3 would be considered a new recruit and get a new
trackID. If dorm = 1
, because there is overlap between two polygons of the same
species with only a 1-year gap between when they occur, these two polygons will
be considered the same genetic individual, will have the same trackID, and our
focal individual will have a "1" in the survival column. In an alternative
scenario, in years 3 and 4 there are not polygons of species "A" that are in the
same location as our focal individual, but there is a polygon in year 4 that
overlaps our focal individual. If dorm = 1
, then our focal individual would
get a "0" for survival, but if dorm = 2
, then it would get a 1 for survival.
exampleSmall <- exampleDat[unique(c(30, 42, 44, 25, 2, 61, 59, 52, 45, 8, 42, 37, 60, 34, 45, 38, 26, 49, 52, 46)),] exampleSmall$FocalInd <- 'other individual' exampleSmall[10,'FocalInd'] <- 'focal individual' exampleSmall$Year <- "Year 1" exampleSmall_2 <- exampleSmall[c(1:9,11:nrow(exampleSmall)),] exampleSmall_2$Year <- "Year 2" exampleSmall_3 <- exampleSmall exampleSmall_3$Year <- "Year 3" exampleSmall <- rbind(exampleSmall, exampleSmall_2, exampleSmall_3) ggplot(data = exampleSmall) + geom_sf(aes(color = FocalInd, fill = FocalInd)) + geom_segment(aes(x = 0, xend = .6, y = 0, yend = 0), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = 0, xend = .6, y = .6, yend = .6), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = 0, xend = 0, y = 0, yend = .6), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = .6, xend = .6, y = 0, yend = .6), size = .5, lineend = "round", color = "grey30") + xlab("quadrat horizontal edge (m)") + ylab("quadrat vertical edge (m)") + xlim(c(0,.6)) + ylim(c(0,.6)) + #labs(title = Year) + facet_wrap( ~ Year) + theme_classic() + theme(axis.line = element_blank(), legend.text = element_text(face = "italic"), plot.margin = margin(1,0,1,0), legend.title = element_blank()) + scale_fill_discrete(type = c("#E69F00", "#A6A6A6", "#009E73")) + scale_color_discrete(type = c("#E69F00", "#A6A6A6", "#009E73"))
If you'd like to be more specific and perhaps biologically accurate, you can
also specify the dorm
argument uniquely for each species. For example, it
might be that you are confident that your data collectors did not accidentally
"miss" any individuals, and your dat
data frame contains observations for
shrubs or trees, which are very unlikely to go dormant, and small forbs, which
are much more likely to go dormant for one or two years. In order to disallow
dormancy for trees and shrubs, but to allow dormancy for forbs, you will
provide a data frame to the dorm
argument instead of a single positive
integer value. There will be two columns: 1) a "Species" column that has the
species name for each species present in dat
, and 2) a column called "dorm"
that has positive integer values indicating the dormancy you'd like to allow
for each species. Make sure that if you are following the data frame approach,
you must provide a dormancy argument for every species that has data in
dat
. Make sure that the species names in the dorm
data frame are spelled
exactly the same as they are in dat
. The data frame should look something
like this:
dormDF <- data.frame("Species" = c("tree A", "shrub B", "tree C", "forb D", "forb E", "forb F"), "dorm" = c(0,0,0,1,2,1)) (dormDF) # knitr::kable(dormDF, caption = "**Table 2.1**: Example 'dorm' data.frame", )
Important Note: Be very careful about how you define the dorm
argument. The
bigger the dorm
argument, the more likely you are to overestimate survival.
For annually-sampled data, I would need a very biologically-compelling reason
to to specify a dorm
argument greater than 1 year.
buff
This is a positive numeric value that indicates how much an individual
can move from year 1 to year 2 and still be considered the same individual
(receive the same trackID). In addition to accounting for true variation in
location of a plant's stem from year to year, this argument also accounts for
small inconsistencies in mapping from year to year. The buff
argument must be
in the same units as the spatial values in dat
. For example, if the spatial
data in dat
is measured in meters, and you want to allow a plant to "move"
15 cm between year 1 and year 2, then you would include the argument
buff = .15
in your call to trackSpp()
. If you want to allow no movement, use
buff = 0
. Below is a visualization of two different buff
scenarios.
exampleDat <- grasslandData[grasslandData$Site == "AZ" & grasslandData$Quad == "SG2" & grasslandData$Year %in% c(1922,1923), ] exampleDatIDsTemp <- exampleDat[exampleDat$Species == "Bouteloua rothrockii",] exampleDatIDsTemp <- exampleDatIDsTemp[round(exampleDatIDsTemp$Area,7) %in% round(c( 0.0005471808, 0.0005321236),7),] exampleDatIDsTemp$ghost <- "observation from current year" exampleBuffed <- st_buffer(exampleDatIDsTemp[round(exampleDatIDsTemp$Area, 7) ==0.0005472,], dist = .10) exampleBuffed$Year <- 1922 exampleBuffed$ghost <- "10 cm buffer" exampleBuffedNext <- exampleBuffed exampleBuffedNext$Year <- 1923 ghost <- exampleDatIDsTemp[round(exampleDatIDsTemp$Area, 7) == 0.0005472,] ghost$Year <- 1923 ghost$ghost <- "polygon location in previous year" exampleDatIDs <- rbind(exampleDatIDsTemp, ghost, exampleBuffed, exampleBuffedNext ) ggplot(data = exampleDatIDs ) + geom_sf(aes(fill = ghost, alpha = ghost, color = ghost, lty = ghost)) + geom_segment(aes(x = .6, xend = 1, y = .4, yend = .4), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = .6, xend = 1, y = .8, yend = .8), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = .6, xend = .6, y = .4, yend = .8), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = 1, xend = 1, y = .4, yend = .8), size = .5, lineend = "round", color = "grey30")+ xlab("quadrat horizontal edge (m)") + ylab("quadrat vertical edge (m)") + #labs(title = Year) + facet_wrap( ~ Year) + theme_classic() + theme(axis.line = element_blank(), legend.text = element_text(face = "italic"), #plot.margin = margin(1,0,1,0), legend.title = element_blank()) + scale_fill_discrete(type = c("#E69F00", "#009E73", "#A6A6A6")) + scale_color_discrete(type = c("#E69F00", "#009E73", "#A6A6A6")) + scale_alpha_discrete(range = c(0, 1, .5)) + scale_linetype_manual(values=c("twodash", "solid", "dotted"))
exampleBuffed <- st_buffer(exampleDatIDsTemp[round(exampleDatIDsTemp$Area, 7) ==0.0005472,], dist = .03) exampleBuffed$Year <- 1922 exampleBuffed$ghost <- "3 cm buffer" exampleBuffedNext <- exampleBuffed exampleBuffedNext$Year <- 1923 ghost <- exampleDatIDsTemp[round(exampleDatIDsTemp$Area, 7) ==0.0005472,] ghost$Year <- 1923 ghost$ghost <- "polygon location in previous year" exampleDatIDs <- rbind(exampleDatIDsTemp, ghost, exampleBuffed, exampleBuffedNext ) ggplot(data = exampleDatIDs ) + geom_sf(aes(fill = ghost, alpha = ghost, color = ghost, lty = ghost)) + geom_segment(aes(x = .6, xend = 1, y = .4, yend = .4), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = .6, xend = 1, y = .8, yend = .8), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = .6, xend = .6, y = .4, yend = .8), size = .5, lineend = "round", color = "grey30") + geom_segment(aes(x = 1, xend = 1, y = .4, yend = .8), size = .5, lineend = "round", color = "grey30") + xlab("quadrat horizontal edge (m)") + ylab("quadrat vertical edge (m)") + #labs(title = Year) + facet_wrap( ~ Year) + theme_classic() + theme(axis.line = element_blank(), legend.text = element_text(face = "italic"), #plot.margin = margin(1,0,1,0), legend.title = element_blank()) + scale_fill_discrete(type = c("#E69F00", "#009E73", "#A6A6A6")) + scale_color_discrete(type = c("#E69F00", "#009E73", "#A6A6A6")) + scale_alpha_discrete(range = c(0, 1, .5)) + scale_linetype_manual(values=c("twodash", "solid", "dotted"))
clonal
This is a logical argument (TRUE or FALSE) that indicates whether you
want to allow plants to be clonal or not. In the context of this type of data,
"clonal" means that one genetic individual (or "genet") can be recorded as
multiple polygons (or "ramets"). If clonal = TRUE
, then multiple polygons in
the same year can be part of the same individual and have the same trackID. If
clonal = FALSE
, then every polygon in a given year is a unique individual and
has a unique trackID. This option can be defined globally for all species
present in dat
by setting clonal
equal to FALSE or TRUE in the trackSpp()
function call. Alternatively, clonal
can be specified uniquely for each
species by creating a data frame that contains a clonal
argument for each
species (analogous to the data frame for the dorm
argument shown in Table 2.1,
but with a column called "clonal"). The following arguments to trackSpp()
are only required in certain contexts.
buffGenet
is an argument that is only required if clonal = TRUE
or if clonal
is a data frame that contains at least a single TRUE
in the
"clonal" column. buffGenet
is a numeric value that indicates how close
polygons of the same species must be to one another in the first year in order
to be considered parts of the same genetic individual (ramets of the same
genet). Similar to buff
, be very careful and conservative when defining this
argument. A large value for buffGenet
can quickly lead to the entire quadrat
being treated as the same genetic individual! I suggest experimenting with
multiple values of buffGenet
, looking at maps that show the trackID
assignment, and deciding on a value that leads to trackID assignments that make
the most biological sense to you. This argument is passed to the
groupByGenet()
function, which assigns the same trackID to individuals that
are within the buffGenet
buffer of each other. Polygons are only grouped by
genet in the first year of data. After that point, grouping by genet happens
based on data in previous years. If there are multiple polygons that overlap
with a genet in the previous year, they are given the same trackID and are
considered ramets belonging to the same genet. The value of buffGenet
must be
greater than or equal to zero, and must be in the same units as the spatial data
in dat
. buffGenet
can be a single numeric value which will be applied to all
species present in dat
, or can be specified uniquely for each species by
creating a data frame that contains a buffGenet
argument for each species
(analogous to the data frame for the dorm
argument shown in Table 2.1, but
with a column called "buffGenet").aggByGenet
is a logical argument that is only required if clonal = TRUE
or if clonal
is a data frame that contains at least a single TRUE
in the
"clonal" column. This argument determines whether the output data frame from
trackSpp()
will have a row for every single ramet, or will be aggregated so
that each genet is represented by a single row. If aggByGenet = FALSE
,
then the output is not aggregated. If aggByGenet = TRUE
(the default
setting), then the results are aggregated using the plantTracker
function
aggregateByGenet()
. This function combines the sf
"POLYGONS" for each ramet
into one sf
MULTIPOLYGON
for the entire genet, and combines the associated
metadata ("Species", "Site", "Quad", "Year", "trackID", "basalArea_genet",
"age", "recruit", "survives_tplus1", "size_tplus1", "nearEdge") into one row for
this genet. Even if the input dat
had additional columns, they will not be
included in the output of trackSpp
if aggByGenet = TRUE
, since it is
uncertain if they can be summed across all ramets or are identical across all
ramets. For example, if each ramet has a unique character string in a column
called "name", there is no easy way to "sum" the character strings in this
column to have one value for each genet. If you want the output data frame from
trackSpp()
to have the same columns as your input dat
data.frame, set the
aggByGenet
argument to FALSE. However, Be Careful, since any demographic
analysis should be done with a data.set that has only one row per genet,
otherwise you will be estimating survival and growth rates on the scale of
ramets instead of genets. If you take the aggByGenet = FALSE
route, be
sure to pass your dataset through the aggregateByGenet()
function (or
aggregate to the genet scale using your preferred method) before
demographic analysis. species/site/quad/year/geometry
These arguments only need to be included if
the columns in dat
that contain the data for species, site,
quadrat, year and geometry of each observation are different from the names
"Species", "Site", "Quad", "Year, and "geometry". For example, if the column in
your version of dat
that contains the species identity of each observation is
called "species_names", then the argument species = "species_names"
must be
included in your call to trackSpp()
. printMessages
This is an optional logical argument that determines whether
this function returns messages about genet aggregation, as well as messages
indicating which year is the last year of sampling in each quadrat and which
year(s) come before a gap in sampling that exceeds the dorm
argument (and thus
which years of data have an "NA" for "survives_tplus1" and "size_tplus1"). If
printMessages = TRUE
(the default), then messages are printed. If
printMessages = FALSE
, then messages are not printed.flagSuspects
This is an optional logical argument of length 1, indicating
whether observations that are "suspect" will be flagged. The default is
flagSuspects = FALSE
. If flagSuspects = TRUE
, then a column called
"Suspect" is added to the output data.frame. Any suspect observations get a
"TRUE" in the "Suspect" column, while non-suspect observations receive a
"FALSE". There are two ways that an observation can be classified as
"suspect". First, if two consecutive observations have the same trackID, but
the observation in year t+1 is less that a certain percentage (defined by the
shrink
arg.) of the observation in year t, it is possible that the
observation in year t+1 is a new recruit and not the same individual. The
second way an observation can be classified as "suspect" is if it is very
small before going dormant. It is unlikely that a very small individual will
survive dormancy, so it is possible that the function has mistakenly given a
survival value of "1" to this individual. A "very small individual" is any
observation with an area below a certain percentile (specified by dormSize
)
of the size distribution for this species, which is generated using all of
the size data for this species in dat
. If you are using the output dataset for
demographic analysis, you may want to exclude "Suspect" observations. If
flagSuspects = FALSE
, then no additional column is added. shrink
This is an optional argument that takes a single numeric value. This
value is only used when flagSuspects = TRUE
. When two consecutive observations
have the same trackID, and the ratio of size_t+1 to size_t is smaller than the
value of shrink
, the observation in year t gets a "TRUE" in the "Suspect"
column. For example, shrink = 0.2
, and an individual that the tracking
function has identified as "BOUGRA_1992_5" has an area of 9 cm$^2$ in year_t and
an area of 1.35 cm$^2$ in year_t+1. The ratio of size_t+1 to size_t is
1.35/9 = 0.15, which is smaller than the cutoff specified by shrink
, so the
observation of "BOUGRA_1992_5" in year t gets a "TRUE" in the "Suspect" column.
The default value is shrink = 0.10
.dormSize
This is an optional argument that takes a single numeric value.
This value is only used when flagSuspects = TRUE
and dorm ≥ 1
. An individual
is flagged as "suspect" if it "goes dormant" and has a size that is less than or
equal to the percentile of the size distribution for this species that is
designated by dormSize
. For example, dormSize = 0.05
, and an individual has
a basal area of 0.5 cm$^2$. The 5th percentile of the distribution of size for
this species, which is made using the mean and standard deviation of all
observations in dat
for the species in question, is 0.6 cm$^2$. This individual
does not have any overlaps in the next year (year_t+1), but does have an overlap
in year_t+2. However, because the basal area of this observation is smaller than
the 5th percentile of size for this species, the observation in year t will get
a "TRUE" in the "Suspect" column. It is possible that the tracking function has
mistakenly assigned a "1" for survival in year_t, because it is unlikely that
this individual is large enough to survive dormancy. The default value is
dormSize = .05
.These are all of the possible arguments to trackSpp()
!
Below is an example of a potential function call to trackSpp()
, using
the example dat
and inv
data we've used so far. :
datTrackSpp <- plantTracker::trackSpp(dat = dat, inv = inv, dorm = 1, buff = .05, buffGenet = .005, clonal = data.frame("Species" = c("Heteropogon contortus", "Bouteloua rothrockii", "Ambrosia artemisiifolia", "Calliandra eriophylla"), "clonal" = c(TRUE,TRUE,FALSE,FALSE)), aggByGenet = TRUE, printMessages = FALSE )
And here's what the output of this call to trackSpp()
looks like:
temp <- datTrackSpp names(temp)[c(7,9,11)] <- c("basalArea", "survives_t+1", "size_t+1") (temp) #knitr::kable(head(temp[100:120,]))
If you did not allow any species to be clonal (clonal = 0
) or if
aggByGenet = TRUE
in your call to trackSpp()
, then your output
data frame will have one row for each genet, and is ready for demographic
analysis! If your output data frame is not yet aggregated by genet
(i.e. you use aggByGenet = FALSE
), then you need to
transform your data frame so that each genet is represented by only one row of
data. You can use the aggregateByGenet()
function from plantTracker
(see this
function's documentation for guidance), or your own method of choice.
You can stop here and proceed to your own analyses using the demographic data
you generated, or you can proceed with other plantTracker
functions outlined
below for some additional useful data.
getNeighbors()
It is often useful in demographic analyses to have some idea of the competition (or facilitation) that an individual organism is dealing with. Interactions between individuals can have a profound impact on whether an organism survives and grows. Spatial datasets of plant occurrence allow us to generate an estimate of the interactions an individual plant has with other plants by determining how many other individuals occupy the "local neighborhood" of each focal plant. While this isn't a direct measure of competition or facilitation, it gives us an estimate that we can include in demographic models.
Here is the generic getNeighbors()
function:
getNeighbors(dat, buff, method, compType = "allSpp", output = "summed", trackID = "trackID", species = "Species", quad = "Quad", year = "Year", site = "Site", geometry = "geometry", ...)
The getNeighbors()
function in plantTracker
calculates local neighborhood
density for each unique individual in your dataset. A user-specified buffer is
drawn around each individual, and then the function counts the number of other
plants within this buffer.This function can only be run on a dataset where each
unique individual (genet) is represented by only one row of data. If the genet
consists of multiple polygons, then they must be aggregated into one sf
MULTIPOLYGON
object. If your dataset has multiple rows for each genet, then
you can use the aggregateByGenet()
function to get it ready to use in
getNeighbors()
. Additionally, getNeighbors()
requires your dataset to have a
column containing a unique identifier for each genet. Across multiple years,
that genet must have the same unique identifier. If you are using this function
right after trackSpp()
, your dataset will already have this unique identifier
in a column called "trackID".
getNeighbors()
has several options that allow you
to customize how local neighborhood density is calculated.
datComp <- datTrackSpp[datTrackSpp$Year == 1922,] ggplot(datComp) + geom_sf(data = sf::st_buffer(datComp[datComp$trackID == "HETCON_1922_6",], .1), aes(), color = "#CC79A7", fill = "#CC79A7", alpha = .3, lty = 2) + geom_sf(aes(fill = Species)) + geom_sf(data = datComp[datComp$trackID == "HETCON_1922_6",], aes(), color = "#CC79A7", fill = "#CC79A7", alpha = 0, lwd = 1.5) + xlim(c(0,.5)) + ylim(c(.2,.7)) + xlab("quadrat horizontal edge (m)") + ylab("quadrat vertical edge (m)") + theme_classic() + theme(legend.text = element_text(face = "italic"), plot.margin = margin(1,0,1,0), legend.title = element_blank()) + scale_fill_discrete(type = c( "#009E73", "#F0E442", "#E69F00", "#56B4E9","#A6A6A6")) + scale_color_discrete(type = c( "#009E73", "#F0E442","#E69F00", "#56B4E9", "#A6A6A6"))
suppressWarnings(st_intersects(st_buffer(datComp[datComp$trackID == "HETCON_1922_6",], .1), datComp)) countMethod <- datComp[c(3,5,63,67,70),] areaMethodTemp <- suppressWarnings(suppressMessages(st_intersection(st_buffer( datComp[datComp$trackID == "HETCON_1922_6",], .1), datComp))) areaMethod <- areaMethodTemp[st_is(areaMethodTemp, c("POLYGON", "MULTIPOLYGON")),]
ggplot(datComp) + geom_sf(data = suppressWarnings(sf::st_buffer(datComp[datComp$trackID == "HETCON_1922_6",], .1)), aes(), color = "#CC79A7", fill = "#CC79A7", alpha = .3, lty = 2) + geom_sf(aes(fill = Species)) + geom_sf(data = datComp[datComp$trackID == "HETCON_1922_6",], aes(), color = "#CC79A7", fill = "#CC79A7", alpha = 0, lwd = 1.5) + geom_sf(data = countMethod, aes(), alpha = 0, lwd = 1.5) + geom_sf_label(data = countMethod, label = c(1,2,3,4,5), nudge_x = .035, label.padding = unit(.15, "lines"), label.size = unit(.05,"mm")) + xlim(c(0,.5)) + ylim(c(.2,.7)) + xlab("quadrat horizontal edge (m)") + ylab("quadrat vertical edge (m)") + theme_classic() + theme(legend.text = element_text(face = "italic"), plot.margin = margin(1,0,1,0), legend.title = element_blank()) + scale_fill_discrete(type = c( "#009E73", "#F0E442", "#E69F00", "#56B4E9","#A6A6A6")) + scale_color_discrete(type = c( "#009E73", "#F0E442","#E69F00", "#56B4E9", "#A6A6A6"))
areaMethodTemp <- suppressWarnings(st_intersection(st_buffer( datComp[datComp$trackID == "HETCON_1922_6",], .1), datComp)) areaMethod <- areaMethodTemp[areaMethodTemp$trackID.1 != "HETCON_1922_6",] ggplot(datComp) + geom_sf(data = suppressWarnings(sf::st_buffer(datComp[datComp$trackID == "HETCON_1922_6",], .1)), aes(), color = "#CC79A7", fill = "#CC79A7", alpha = .3, lty = 2) + geom_sf(aes(fill = Species)) + geom_sf(data = datComp[datComp$trackID == "HETCON_1922_6",], aes(), color = "#CC79A7", fill = "#CC79A7", alpha = 0, lwd = 1.5) + geom_sf(data = areaMethod, aes(), fill = "#A6A6A6", color = "grey50", alpha = 0.7, lwd = 1.5) + xlim(c(0,.5)) + ylim(c(.2,.7)) + xlab("quadrat horizontal edge (m)") + ylab("quadrat vertical edge (m)") + theme_classic() + theme(legend.text = element_text(face = "italic"), plot.margin = margin(1,0,1,0), legend.title = element_blank()) + scale_fill_discrete(type = c( "#009E73", "#F0E442", "#E69F00", "#56B4E9","#A6A6A6")) + scale_color_discrete(type = c( "#009E73", "#F0E442","#E69F00", "#56B4E9", "#A6A6A6")) ## basal area of focal individual = 0.004981431 ## area of buffer 0.06953875 - 0.004981431 = 0.06455732 ## area of interspecific competitors 2.504885e-05 + 9.549092e-04 + 1.952117e-03 + 2.461883e-05 + 2.461883e-05 = 0.002981313 ## area of intraspecific competitors 2.504885e-05 + 9.549092e-04 + 1.952117e-03 = 0.002932075 ## proportion of interspecific competitors 0.002981313 / 0.06455732 = 0.04618087 ## proportion of intraspecific competitors 0.002932075 / 0.06455732 = 0.04541816
Below are the arguments in the getNeighbors()
function.
dat
An sf
data frame in which each row represents data for a unique
individual organism in a unique year. The sf
geometry for each row must be
either MULTIPOLYGON
or POLYGON
geometry. In addition to a "geometry" column,
this data frame must have columns that contain data indicating the site,
quadrat, site, and year of each observation. There also must be a column that
contains a unique identifying value for each genet in each year. If dat
is
coming directly from trackSpp()
, this column will be called "trackID". buff
This is a single numeric value that indicates the desired width of the
"buffer" around the focal individual in which the competitors are to be counted.
This value must be in the same units as the spatial information in dat
. method
This is a character string that must equal either "count"
or "area"
.
If method = "count"
, then the number of individuals in the buffer area will be
tallied. If method = "area"
, then the proportion of the buffer area that is
occupied by other individuals will be calculated. compType
This is a character string that must be either "allSpp"
or
"oneSpp"
. If compType = "allSpp"
, then a metric of interspecific
competition is calculated, meaning that every individual within the buffer
around the focal individual is considered, no matter the species. If
compType = "oneSpp"
, then a metric of intraspecific competition is calculated,
meaning that only individuals of the same species as the focal individual will
be considered when calculating the competition metric. If no value is provided,
it will default to "allSpp". output
This is a character string that is set to either "summed"
or
"bySpecies"
. The default is "summed"
. This argument is only important to
consider if you are using compType = "allSpp"
. If output = "summed"
, then only
one count/area value is returned for each individual. This value is the total
count or area of all neighbors within the focal species buffer zone, regardless
of species. If output = "bySpecies"
, there is a count or area value returned for
each species present in the buffer zone. For example, you are using
getNeighbors()
with method = "count"
and compType = "allSpp"
. A focal
individual in your dataset has seven other plants inside its buffer zone, three
of species A, two of species B, and 2 of species C. If output = "summed"
, the
value in the "neighbors_count" column of the returned data frame will simply
contain the value "7". If output = "bySpecies"
, the "neighbors_count" column
for this individual will actually contain a named list
{r} list("Species A "= 5, "Species B" = 3, "Species C" = 7)
. The default value
of output
is "summed"
. trackID/species/quad/year/site/geometry
These arguments only need to be
included if the columns in dat
that contain the data for trackID, species,
site, quadrat, year and geometry of each observation are different from the
names "trackID, "Species", "Site", "Quad", "Year, and "geometry". For example,
if the column in your version of dat
that contains the species identity of
each observation is called "species_names", then the argument
species = "species_names"
must be included in your call to getNeighbors()
. The output of getNeighbors()
is an sf
data frame that is identical to the
input dat
, but with either one or two additional columns. If
method = "area"
, there are two columns added called "nBuff_area" and
"neighbors_area". The first contains the area of the buffer zone around each
focal individual. The second contains the basal area of neighbors that overlap
with a focal individual's buffer zone. If method = "count"
, there is only one
additional column added to the output, called "neighbors_count." This column
contains a count of the neighbors that occur within a focal individual's
buffer zone.
Here's an example of a getNeighbors()
function call using the resulting data
from the example in section 2.2, as well as the resulting data frame. Note
that method = "area"
, so two columns are added to the returned data frame:
datNeighbors <- plantTracker::getNeighbors(dat = datTrackSpp, buff = .15, method = "area", compType = "allSpp")
temp <- datNeighbors rownames(temp) <- NULL names(temp)[c(8,10,12)] <- c("basalArea", "survives_t+1", "size_t+1") (temp) #knitr::kable(head(temp[100:110,]), row.names = FALSE)
The example above uses the option output = "summed"
, which is the default for
the getNeighbors()
function. With this option, the "neighbors_area" or
"neighbors_count" column (depending on the method
argument) contains just one
value that sums the neighbor count or area across all neighbor species. However,
if output = "bySpecies"
, the "neighbors_count" or "neighbors_area" column
contains a list with the counts or areas broken down by species. The output
argument is described in more detail in section 3.1. If you want to use the
getNeighbors()
function to determine how the effect of neighbors differs
according to the species identity of those neighbors, setting output = "bySpecies"
allows you to do this. However, it is likely that your
subsequent analyses will need the by-species neighbor data in a matrix or data
frame format, rather than in a list nested inside a data frame, which is how
getNeighbors()
returns the data.
Here is some code that turns the data returned by
getNeighbors(output = "bySpecies")
into a matrix where each row has data for one
focal individual, and each column has data for one species. This format should
be easier to work with for further analysis.
# save the output of the getNeighbors() function datNeighbors_bySpp <- plantTracker::getNeighbors(dat = datTrackSpp, buff = .15, method = "area", compType = "allSpp", output = "bySpecies") # determine all of the possible species that can occupy the buffer zone compSpp <- unique(datTrackSpp$Species) temp <- lapply(X = datNeighbors_bySpp$neighbors_area, FUN = function(x) { tmp <- unlist(x) tmp2 <- tmp[compSpp] } ) for (i in 1:length(temp)) { # fix the column names names(temp[[i]]) <- compSpp # save the data in a matrix if (i == 1) { datOut <- temp[[i]] } else { datOut <- rbind(datOut, temp[[i]]) } } # make the rownames of the matrix correspond to the trackID of the focal individual rownames(datOut) <- datNeighbors_bySpp$trackID # show the first few rows of the datOut data frame: datOut[1:5,]
At this point, this dataset should be ready for you to use in any applications wish! There are a few additional functions that may help you in your analyses, and these are outlined in this section.
getRecruits()
functiongetLambda()
functiongetBasalAreas()
functiondrawQuadMap()
functionSpecific instructions for how to use each of these functions can be found in their documentation!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.