The spread of invasive species is due both to diffusive spread and human-assisted jump dispersal, often caused by species hitchhiking on vehicles. The aim of this vignette is to differentiate diffusive spread from jump dispersal in invasive species based on occurrence data. This identification required the development of a directional analysis of species presence. We demonstrate this method using occurrence data of the spotted lanternfly, Lycorma delicatula, an invasive Hemiptera in the United States.
We load the necessary packages.
library(magrittr) library(tidyverse) library(sf) library(ggplot2) library(here) library(dplyr) library(slfjumps)
The SLF dataset is available from the lydemapr
companion package. In the download_data
folder, then in the v2_2023
subfolder, download the lyde_data_v2.zip
file and place it in the data
folder of your local jumpID
project. At the date at which this vignette is written, lydemapr
contains data up to year 2022.
We load this dataset that contains all the occurrences of the spotted lanternfly, and take a look at it.
slf <- read.csv(file.path(here(), "data", "lyde_data_v2", "lyde.csv")) slf %<>% select(bio_year, latitude, longitude, lyde_established) tibble::tibble(slf)
The dataset contains 831,039 rows, each corresponding to a SLF survey at a specific date and location. The columns that we will using are:
bio_year: the biological year of the occurrence (see lydemapr
for a description of the difference between year and bio_year)
latitude and longitude (XY coordinates). The precise location of SLF surveys is here summarized in 1-km² cells. All coordinates are in WGS84 (EPSG 4326)
* lyde_established: whether several live SLF were found in this survey
First, to use the jumpID
package, we need to rename the columns with generic names expected by the package.
slf %<>% rename(year = bio_year, established = lyde_established)
Second, multiple surveys were sometimes conducted at the same location during the same year, resulting in a redundant dataset for the purpose of our analysis. We summarize the information available at each location every year, so that a row now represents the detection status at a location for a given year.
Note: when several surveys indicate that SLF are "present" the same year at the same location, it could be tempting to categorize SLF as "established". However, the category "present" often refers to dead individuals, although this information is not explicitly available. We use a conservative approach and consider SLF as established in a cell only if one of the surveys considered it as established.
grid_data <- slf %>% dplyr::group_by(year, latitude, longitude) %>% dplyr::summarise(established = any(established)) %>% ungroup() tibble::tibble(grid_data)
The dataset now has 130,015 rows, corresponding to r length(unique(grid_data$latitude))
x r length(unique(grid_data$longitude))
cells. Let's look at these data on a map.
# extract a map of the States and recodes state labels to show the two-letter code rather than the full state name. # obtaining simple feature objects for states and finding centroids for label positioning states <- sf::st_as_sf(maps::map("state", plot = FALSE, fill = TRUE)) %>% st_transform(crs = 4326) sf::sf_use_s2(FALSE) states <- cbind(states, st_coordinates(st_centroid(states))) sf::sf_use_s2(TRUE) # making table key for state 2-letter abbreviations # the vectors state.abb and state.name contains strings of all # US states and abbreviations state_abbr <- tibble(state.name = stringr::str_to_lower(state.name), state.abb) %>% dplyr::left_join(tibble(ID = states$ID), ., by = c(ID = "state.name")) %>% dplyr::mutate(state.abb = replace_na(state.abb, "")) # adding 2-letter codes to sf states$code <- state_abbr$state.abb # Map all surveys ggplot(data = states) + geom_point(data = grid_data, aes(x = longitude, y = latitude)) + xlab("Longitude") + ylab("Latitude") + geom_sf(data = states, alpha = 0) + facet_wrap(~year) + theme_classic()
Cells were predominantly surveyed in the Eastern US, but there are also cells documented in other parts of the US, starting in 2019. Positive cells (established SLF) are only in the Eastern US, so maps will be zoomed in on the Eastern US from here.
Every year, there is a large number of cells surveyed with positive (SLF established) or negative results (SLF not established). A zoomed-in map allows to see the cell grid.
ggplot(data = states) + geom_point(data = grid_data, aes(x = longitude, y = latitude, col = established), size = 1) + scale_color_manual(values = c("gray", "black")) + labs(col = "SLF established") + xlab("Longitude") + ylab("Latitude") + geom_sf(data = states, alpha = 0) + geom_text(data = states, aes(X, Y, label = code), size = 2) + coord_sf(xlim = c(-78.5, -78), ylim = c(39, 39.5), expand = FALSE) + theme(legend.position = "bottom")
The progression of the invasion will be measured starting from the introduction point that has been documented in Barringer et al. 2015. We store the coordinates of the introduction point in an object for the next analyses. If the precise introduction point is unknown, it may be replaced by the centroid of the invasion at the time the invasive species was discovered in the invasive range.
# Coordinates of the introduction site, extracted from Barringer et al. 2015 # As a table (for distance calculations): centroid_df <- data.frame(longitude = -75.675340, latitude = 40.415240) # Replace by the centroid of the invasion in the first years of the invasion if the introduction point is unknown: # centroid_df <- slf %>% filter(year == 2014) %>% # summarise(latitude = mean(.$latitude), # longitude = mean(.$longitude))
To finish preparing the dataset for analysis, we calculate the distance between each survey cell and the introduction point.
grid_data %<>% dplyr::mutate(DistToIntro = geosphere::distGeo(grid_data[,c(3,2)], centroid_df)/1000) # the distance is obtained in meters, and we divide it by 1000 to obtain kilometers given the scale of this invasion. summary(grid_data) # save this dataset write.csv(grid_data, file.path(here(), "exported-data", "grid_data.csv"), row.names = F)
The dataset is now ready for the jump analysis.
A set of four successive functions will be run to separate cells where SLF is established due to continuous, diffusive spread, from cells where SLF is established due to jump dispersal. We will first run the functions on a tentative set of parameters to understand how the analysis works. The optimization of the parameters will be described later in this vignette.
Considering that the expansion of the invasion is heterogeneous in space, the jump analyses requires the invasion range to be divided into sectors with the introduction site as the origin. The first function, attribute_sectors()
attributes a sector number to each cell of the dataset, to be used in the jump analysis. Here, we divide the invasion range into 16 sectors. See vignette #2 "Decreasing calculation time" to learn how changing the number of sectors may make your analysis faster.
grid_data_sectors <- slfjumps::attribute_sectors(dataset = grid_data, # dataset to be explored nb_sectors = 16, # number of sectors to divide the invasion range centroid = c(-75.675340, 40.415240)) # vector containing the centroid coordinates as long/lat # Map results ggplot(data = states) + geom_point(data = grid_data_sectors, aes(x = longitude, y = latitude, col = as.factor(sectors_nb))) + geom_sf(data = states, alpha = 0) + geom_text(data = states, aes(X, Y, label = code), size = 2) + theme(legend.position = "bottom") + xlab("Longitude") + ylab("Latitude") + coord_sf(xlim = c(-82, -72), ylim = c(38, 43), expand = FALSE) + labs(col = "Sector number")
The analysis will go through each sector successively.
The function find_thresholds()
will then search for the first discontinuity in the SLF distribution. Points are sorted by increasing distance to the introduction point, and the function goes through each consecutive pair of points, starting from the introduction point and going outwards. It stops once it identifies a distance larger than a defined gap_size
between two consecutive points, marking a discontinuity in the SLF distribution. The last point before this discontinuity marks the putative limit of the diffusive spread.
The function runs independently for each sector. It is considered that populations do not go extinct, and data are cumulated over time. The limit of diffusive spread cannot go back towards the introduction point over time.
The input dataset is the output of attribute_sectors(). The gap_size
parameter defines the minimal distance between two consecutive points that marks a discontinuity in the SLF distribution, in km, and it is here set to 15 km. If the option "negatives" is set to TRUE (default), the function will check that there are negative surveys in the discontinuity, so that the absence of SLF established is not due to an absence of surveys in the area.
Results_thresholds <- slfjumps::find_thresholds(dataset = grid_data_sectors, gap_size = 15, negatives = T) # store resulting objects preDist <- Results_thresholds$preDist # list of threshold points = limits of the #diffusive spread. Will be completed in find_secDiff() potJumps <- Results_thresholds$potJumps # list of potential jumps. Will be #pruned in find_jumps() # Make a single object for the map thresholds_map <- dplyr::bind_rows(preDist %>% dplyr::mutate(Type = "Diffusion threshold"), potJumps %>% dplyr::mutate(Type = "Potential jump")) # Map results ggplot(data = states) + geom_point(data = grid_data %>% filter(established == T), aes(x = longitude, y = latitude), col = "gray") + geom_point(data = thresholds_map, aes(x = longitude, y = latitude, col = Type), size = 1) + geom_sf(data = states, alpha = 0) + facet_wrap(~year) + geom_text(data = states, aes(X, Y, label = code), size = 2) + theme_classic() + theme(legend.position = "bottom") + scale_color_manual(values = c("blue", "green")) + xlab("Longitude") + ylab("Latitude") + labs(col = "Identification") + coord_sf(xlim = c(-82, -72), ylim = c(38, 43), expand = FALSE)
On this map, blue points indicate the thresholds identified by find_thresholds()
, i.e., the limits of the diffusive spread in every direction, for every year. At this point of the analysis, all points farther than these thresholds are stored as potential jumps (green points).
The function find_jumps()
prunes the list of potential jumps of cells that are less than Jumps
. The pruned points may be examined in the last function, find_secDiff()
. The analysis may also stop here if only jumps are of biological interest.
Results_jumps <- slfjumps::find_jumps(grid_data = grid_data, potJumps = potJumps, gap_size = 15, crs = 4326) # store resulting objects Jumps <- Results_jumps$Jumps # final list of jumps diffusers <- Results_jumps$diffusers # list of positive cells stemming from diffusive spread potDiffusion <- Results_jumps$potDiffusion # list of cells containing a mix of secondary diffusion and additional threshold points. Will be pruned in find_secDiff() # Make a single object for the map jumps_map <- dplyr::bind_rows(preDist %>% mutate(Type = "Diffusion threshold"), Jumps %>% mutate(Type = "Jump"), potDiffusion %>% mutate(Type = "Potential diffusion")) # Map results ggplot(data = states) + geom_point(data = grid_data %>% filter(established == T), aes(x = longitude, y = latitude), col = "gray") + geom_point(data = jumps_map, aes(x = longitude, y = latitude, col = Type)) + geom_sf(data = states, alpha = 0) + facet_wrap(~year) + geom_text(data = states, aes(X, Y, label = code), size = 2) + theme_classic() + theme(legend.position = "bottom") + scale_color_manual(values = c("blue", "green", "orange")) + xlab("Longitude") + ylab("Latitude") + labs(col = "Identification") + coord_sf(xlim = c(-82, -72), ylim = c(38, 43), expand = FALSE)
Cells that were discarded from the jump list are either additional threshold points or secondary diffusion from a previous jump. find_secDiff()
attributes them to the correct category by checking whether they are close from a previous jump, or a diffusion point.
Results_secDiff <- slfjumps::find_secDiff(potDiffusion = potDiffusion, Jumps = Jumps, diffusers = diffusers, Dist = preDist, gap_size = 15, crs = 4326) # store resulting objects secDiff <- Results_secDiff$secDiff Dist <- Results_secDiff$Dist # Make a single object for the map secDiff_map <- dplyr::bind_rows(Dist %>% mutate(Type = "Diffusion threshold"), Jumps %>% mutate(Type = "Jump"), secDiff %>% mutate(Type = "Secondary diffusion")) # Map results ggplot(data = states) + geom_point(data = grid_data %>% filter(established == T), aes(x = longitude, y = latitude), col = "gray") + geom_point(data = jumps_map, aes(x = longitude, y = latitude, col = Type)) + geom_sf(data = states, alpha = 0) + facet_wrap(~year) + geom_text(data = states, aes(X, Y, label = code), size = 2) + theme_classic() + theme(legend.position = "bottom") + scale_color_manual(values = c("blue", "green", "orange")) + xlab("Longitude") + ylab("Latitude") + labs(col = "Identification") + coord_sf(xlim = c(-82, -72), ylim = c(38, 43), expand = FALSE)
All these analyses can be done in one bout using the wrapper function.
jumps_wrapper <- slfjumps::find_jumps_wrapper(dataset = grid_data, nb_sectors = 16, centroid = c(-75.675340, 40.415240), gap_size = 15, negatives = T, crs = 4326) # store resulting objects Dist <- jumps_wrapper$Dist Jumps <- jumps_wrapper$Jumps secDiff <- jumps_wrapper$secDiff # save them write.csv(Dist, file.path(here(), "exported-data", "thresholds.csv"), row.names = F) write.csv(Jumps, file.path(here(), "exported-data", "jumps.csv"), row.names = F) write.csv(secDiff, file.path(here(), "exported-data", "secDiffusion.csv"), row.names = F)
We plot the results on a map.
# Make a single object for the map jumps_wrapper_map <- dplyr::bind_rows(Dist %>% dplyr::mutate(Type = "Diffusion threshold"), Jumps %>% dplyr::mutate(Type = "Jump"), secDiff %>% dplyr::mutate(Type = "Secondary diffusion")) # Map results ggplot(data = states) + geom_point(data = grid_data %>% filter(established == F), aes(x = longitude, y = latitude), col = "gray90") + geom_point(data = grid_data %>% filter(established == T), aes(x = longitude, y = latitude), col = "gray30") + geom_point(data = jumps_wrapper_map, aes(x = longitude, y = latitude, col = Type)) + geom_sf(data = states, alpha = 0) + facet_wrap(~year) + geom_text(data = states, aes(X, Y, label = code), size = 2) + theme_classic() + theme(legend.position = "bottom") + scale_color_manual(values = c("blue", "orange", "green")) + xlab("Longitude") + ylab("Latitude") + labs(col = "Identification") + coord_sf(xlim = c(-82, -72), ylim = c(38, 43), expand = FALSE)
Jumps <- read.csv(file.path(here(), "exported-data", "jumps.csv"))
Some jumps were clustered in the same area. Jump clusters may stem from independent dispersal jumps, i.e. SLF hitchhiked multiple times to these locations the same year. Alternatively, jump clusters may result from SLF quickly spreading around a single dispersal jump. Finally, they can be a mix between these two hypotheses. We identify these jump clusters to better understand jump locations.
First, we delineate these jump clusters as jumps located less than the gap size from each other. Check how many points there are per jump cluster.
Jump_groups <- slfjumps::group_jumps(Jumps, gap_size = 15) Groups <- Jump_groups %>% dplyr::group_by(year, Group) %>% dplyr::summarise(Nb = n()) %>% dplyr::arrange(-Nb) %>% dplyr::filter(Nb > 1) ggplot(Groups, aes(x = as.factor(year), y = Nb)) + geom_boxplot() + theme_classic() + ylab("Number of jumps per cluster") + xlab("Year")
We summarize each jump cluster by summarizing each of them by their most central point.
Jump_clusters <- slfjumps::rarefy_groups(Jump_groups) %>% dplyr::mutate(Rarefied = TRUE) write.csv(Jump_clusters, file.path(here(), "exported-data", "jump_clusters.csv"), row.names = F) # Map these groups and the rarefied points ggplot(data = states) + geom_sf(data = states, fill = "white") + coord_sf(xlim = c(-82, -72), ylim = c(38, 43), expand = FALSE) + geom_point(data = Jump_groups, aes(x = longitude, y = latitude, col = as.factor(Group)), shape = 19, size = 3, show.legend = F) + geom_point(data = Jump_clusters, aes(x = longitude, y = latitude)) + labs(x = "Longitude", y = "Latitude")
These r dim(Jump_groups)[1]
jumps were rarefied into r dim(Jump_clusters)[1]
jump clusters.
-- end of vignette --
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.