In ieco-lab/slfrsk: Research Compendium for Lycorma delicatula paninvasion study

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Wrangle and Tidy Package data

All of the following data sources will be read in as raw data from slfrsk/data/ (originally cleaned a bit from raw data versions in slfrsk/data-raw/) and then cleaned up individually in this vignette (see /data-raw/convert_data_rda.R for script and commenting that includes earlier unit conversion and transformation of raw data from .csv to .rda files used here). Depending on the situation, the resultant "cleaned/tidied" data may be saved to slfrsk/data/ as a new .rda file. In this vignette, the following data are treated:

Establishment potential
Transport potential
Impact potential

Setup

Here we load the required packages to prepare the data.

library(slfrsk) #this package for data sources
library(tidyverse) #data tidying and cleaning
library(here) #making directory pathways easier on different instances

Establishment

Maxent Ensemble Model

Previously, we built three separate species distribution models (SDMs) with MaxEnt (See species distribution modeling analysis for more information). We also obtained summary statistics for suitability in each geopolitical unit (country and U.S. state) in each SDM (as well as an ensemble SDM), which were transferred to the states_extracts and countries_extracts datasets (or states_extracts_ensemble and countries_extracts_ensemble for the ensemble SDM).

Here, we explore the ensemble maximum suitability by obtaining the maximum suitability per geopolitical unit for each SDM and calculating the mean and standard error across the models (or just use the extracted maximum value for ensemble models). These summarized suitabilities constitute our predicted Establishment potential.

We also strip Western Sahara from the countries datasets here and for other datsets, as it is not informative for our analyses (zero trade for Transport potential below).

#new extract of ensemble
if(TRUE){
  data("states_extracts_ensemble")

summary_states_extracts <- states_extracts_ensemble %>%
  group_by(geopol_unit) %>%
  summarize(grand_mean_max = mean(obs_max))

data("countries_extracts_ensemble")

summary_countries_extracts <- countries_extracts_ensemble %>%
  group_by(geopol_unit) %>%
  summarize(grand_mean_max = mean(obs_max))

#rm W. Sahara
summary_countries_extracts <- summary_countries_extracts %>%
    filter(geopol_unit != "Western Sahara")

}

#old extracts
if(FALSE){
  data("states_extracts")

summary_states_extracts <- states_extracts %>%
  group_by(geopol_unit) %>%
  summarize(grand_mean_max = mean(obs_max), grand_se_max = sd(obs_max) / sqrt(n_distinct(model)))

data("countries_extracts")

summary_countries_extracts <- countries_extracts %>%
  group_by(geopol_unit) %>%
  summarize(grand_mean_max = mean(obs_max), grand_se_max = sd(obs_max) / sqrt(n_distinct(model)))

#rm W. Sahara
summary_countries_extracts <- summary_countries_extracts %>%
    filter(geopol_unit != "Western Sahara")
}

Transport

Import Trade Data

Now that Establishment data are nicely organized, we can move on to Transport. We estimate Transport potential based on the average annual mass of goods traded (in metric tons) with SLF-established U.S. states over 2012--2017 (we also track value of traded goods in USD). Our existing data can be partitioned into geopolitical units as above (states and countries), but can further be separated into two categories: 1. states with confirmed SLF populations (present) and 2. states that have a high probability of hosting SLF populations in the near future based on proximity to present states and records of isolated observations (future).

Although we maintain several possible permutations of data to analyze, we focus on mass of goods traded for present scenarios, as these data are unlikely to be skewed by trade of high value goods that are less likely to lead to transportation of egg masses (e.g., electronics) and provide us with a more careful outlook for risk of slf paninvasion.

U.S. States

We first treat the states data by adding two letter codes and SLF-establishment status. For present, we treat the following states as having established SLF: CT, DE, MD, NJ, NY, OH, PA, VA, and WV. For future, we add the following states as having established SLF: NC. From here, the data are tidied and gathered before calculating the mean and standard error (SE) of trade with established states for 2012--2017. To do so, we first removed all cases of intrastate trade for established states. Then, we calculated the total trade with established states for each year before obtaining the overall mean and SE.

Below, we demonstrate the code for both present and future trade data in value (USD) for U.S. states.

#load states value
data("states_trade_summary_slf")
data("states_trade_summary_slf_future")

#code snippet to attach infection status and add state abbreviations
#state id's for adding to extracts
stateid <- c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "DC", "WV", "WI", "WY")
stateid <- tibble(stateid, states_trade_summary_slf$destination)
colnames(stateid) <- c("ID", "geopol_unit")

#PRESENT

#add in the infection status of each state while adding the abbreviations
 states_trade_summary_slf <-  left_join(states_trade_summary_slf, stateid, by = c("destination" = "geopol_unit")) %>%
    mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>%
    dplyr::select(destination, ID, status, everything())

  #tidying of data
  states_trade_summary_slf <- states_trade_summary_slf %>%
    gather(-destination:-status, key = "infected_state", value = "trade") %>%
    as.data.frame(.)

  #total infected states trade, average/SE
  summary_states_trade_value <- states_trade_summary_slf %>%
    group_by(destination) %>%
    #filter to be just trade with infected states present
    filter(str_detect(infected_state, "pa_") | 
             str_detect(infected_state, "de_") | 
             str_detect(infected_state, "va_") | 
             str_detect(infected_state, "nj_") | 
             str_detect(infected_state, "md_") | 
             str_detect(infected_state, "ct_") | 
             str_detect(infected_state, "ny_") | 
             str_detect(infected_state, "oh_") | 
             str_detect(infected_state, "wv_")
           ) %>%
    #remove the all state trade from consideration
    filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%  
    #make a temporary col
    mutate(tempcol = str_split(infected_state, "_")) %>%  
    #group by row
    rowwise() %>% 
    #split out the info about states and years of trade
    mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
    #rm the temp col
    dplyr::select(-tempcol) %>% 
    #group based now on the destination, year, and infection status
    #filter out the intrastate trade for infected states to avoid extra low averages by creating a hybrid string (DestinationState+abbreviated_infected state)
    group_by(destination, year, status) %>% 
    mutate(clear_intra = paste0(destination, source)) %>%
    filter(!clear_intra %in% 
             c("Connecticutct", 
               "Delawarede", 
               "Marylandmd", 
               "New Jerseynj", 
               "New Yorkny", 
               "Ohiooh", 
               "Pennsylvaniapa", 
               "Virginiava", 
               "West Virginiawv")
           ) %>%
    #remove workaround infected intrastate column
    dplyr::select(-clear_intra) %>%
    #add up the trade with infected states for each destination state and year (THIS IS NOT AVG INFECTED TRADE YET)
    summarize(avg_infected_trade = sum(trade)) %>%  
    group_by(destination, status) %>%
    #obtain the standard error (SE) of infected trade across years (this repeats across all of the entries across years and thus becomes its own value)
    mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>% 
    #summarize the mean and SE across years! (there is no need to actually take the mean of SE here, but it makes summarizing simplier)
    summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se))  

#FUTURE
    states_trade_summary_slf_future <-  left_join(states_trade_summary_slf_future, stateid, by = c("destination" = "geopol_unit")) %>%
    mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NC", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>%
    dplyr::select(destination, ID, status, everything())

  #tidying of data
  states_trade_summary_slf_future <- states_trade_summary_slf_future %>%
    gather(-destination:-status, key = "infected_state", value = "trade") %>%
    as.data.frame(.)

  #total infected states trade, average/SE
  summary_states_trade_value_future <- states_trade_summary_slf_future %>%
    group_by(destination) %>%
    filter(str_detect(infected_state, "pa_") | 
             str_detect(infected_state, "de_") | 
             str_detect(infected_state, "va_") | 
             str_detect(infected_state, "nj_") | 
             str_detect(infected_state, "md_") | 
             str_detect(infected_state, "nc_") | 
             str_detect(infected_state, "ny_") | 
             str_detect(infected_state, "oh_") | 
             str_detect(infected_state, "wv_") | 
             str_detect(infected_state, "ct_")
           ) %>%
    filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
    mutate(tempcol = str_split(infected_state, "_")) %>%
    rowwise() %>%
    mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
    dplyr::select(-tempcol) %>%
    group_by(destination, year, status) %>%
    #filter out the intrastate trade for infected states to avoid extra low averages
    mutate(clear_intra = paste0(destination, source)) %>%
    filter(!clear_intra %in% 
             c("Connecticutct", 
               "Delawarede", 
               "Marylandmd", 
               "North Carolinanc", 
               "New Jerseynj", 
               "New Yorkny", 
               "Ohiooh", 
               "Pennsylvaniapa", 
               "Virginiava", 
               "West Virginiawv"
               )
           ) %>%
    #remove workaround infected intrastate column
    dplyr::select(-clear_intra) %>%
    summarize(avg_infected_trade = sum(trade)) %>%
    group_by(destination, status) %>%
    mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>%
    summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se))

We repeat the same process for states but for the total mass of imported goods (mass). In doing so, we convert the existing units from kg to metric tons.

#load states mass
data("states_trade_mass_summary_slf")
data("states_trade_mass_summary_slf_future")

#PRESENT
  #add in the infection status of each state while adding the abbreviations
   states_trade_mass_summary_slf <- left_join(states_trade_mass_summary_slf, stateid, by = c("destination" = "geopol_unit")) %>%
    mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>%
    dplyr::select(destination, ID, status, everything())

     #tidying of data and dividing by 1000 to convert from kg to metric tons
  states_trade_mass_summary_slf <- states_trade_mass_summary_slf %>%
    gather(-destination:-status, key = "infected_state", value = "mass") %>%
    mutate(mass = mass / 1000) %>%
    as.data.frame(.)

  #mass
  summary_states_trade_mass <- states_trade_mass_summary_slf %>%
    group_by(destination) %>%
    #filter to be just trade with infected states present
    filter(str_detect(infected_state, "pa_") | 
             str_detect(infected_state, "de_") | 
             str_detect(infected_state, "va_") | 
             str_detect(infected_state, "nj_") | 
             str_detect(infected_state, "md_") | 
             str_detect(infected_state, "ct_") | 
             str_detect(infected_state, "ny_") | 
             str_detect(infected_state, "oh_") | 
             str_detect(infected_state, "wv_")
           ) %>%
    filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
    mutate(tempcol = str_split(infected_state, "_")) %>%
    rowwise() %>%
    mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
    dplyr::select(-tempcol) %>%
    group_by(destination, year, status) %>%
    #filter out the intrastate trade for infected states to avoid extra low averages
    mutate(clear_intra = paste0(destination, source)) %>%
    filter(!clear_intra %in% 
             c("Connecticutct", 
               "Delawarede", 
               "Marylandmd", 
               "New Jerseynj", 
               "New Yorkny", 
               "Ohiooh", 
               "Pennsylvaniapa", 
               "Virginiava", 
               "West Virginiawv")
           ) %>%
    #remove workaround infected intrastate column
    dplyr::select(-clear_intra) %>%
    summarize(avg_infected_mass = sum(mass)) %>%
    group_by(destination, status) %>%
    mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>%
    summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))

#FUTURE
   #add col to identify infected states
  states_trade_mass_summary_slf_future <-  left_join(states_trade_mass_summary_slf_future, stateid, by = c("destination" = "geopol_unit")) %>%
    mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NC", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>%
    dplyr::select(destination, ID, status, everything())

  #tidying of data and dividing by 1000 to convert from kg to metric tons
  states_trade_mass_summary_slf_future <- states_trade_mass_summary_slf_future %>%
    gather(-destination:-status, key = "infected_state", value = "mass") %>%
    mutate(mass = mass / 1000) %>%
    as.data.frame(.)

  #mass
  summary_states_trade_mass_future <- states_trade_mass_summary_slf_future %>%
    group_by(destination) %>%
    filter(str_detect(infected_state, "pa_") | 
             str_detect(infected_state, "de_") | 
             str_detect(infected_state, "va_") | 
             str_detect(infected_state, "nj_") | 
             str_detect(infected_state, "md_") | 
             str_detect(infected_state, "nc_") | 
             str_detect(infected_state, "ny_") | 
             str_detect(infected_state, "oh_") | 
             str_detect(infected_state, "wv_") | 
             str_detect(infected_state, "ct_")
           ) %>%    
    filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
    mutate(tempcol = str_split(infected_state, "_")) %>%
    rowwise() %>%
    mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
    dplyr::select(-tempcol) %>%
    group_by(destination, year, status) %>%
    #filter out the intrastate trade for infected states to avoid extra low averages
    mutate(clear_intra = paste0(destination, source)) %>%
    filter(!clear_intra %in% 
             c("Connecticutct", 
               "Delawarede", 
               "Marylandmd", 
               "North Carolinanc", 
               "New Jerseynj", 
               "New Yorkny", 
               "Ohiooh", 
               "Pennsylvaniapa", 
               "Virginiava", 
               "West Virginiawv"
               )
           ) %>%
    #remove workaround infected intrastate column
    dplyr::select(-clear_intra) %>%
    summarize(avg_infected_mass = sum(mass)) %>%
    group_by(destination, status) %>%
    mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>%
    summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))

Here, you can see the summary for the first 10 states for the future trade data for mass of trade (metric tons).

knitr::kable(head(summary_states_trade_mass_future, n = 10))

Countries

From here, we turn to complete the same analyses for countries as states (value of trade in USD and then mass of trade in metric tons). Notably, we do not have to worry about the intrastate trade that we filtered out in the states datasets. As such, we do not bother with attaching a $status identifier to each country (this is done as needed for visualization later).

#COUNTRIES

#VALUE
#load countries values
data("countries_trade_summary_slf")
data("countries_trade_summary_slf_future")

#PRESENT

#tidying of data and log transforming values
countries_trade_summary_slf <- countries_trade_summary_slf %>%
  gather(-destination, key = "infected_state", value = "trade") %>%
  as.data.frame(.)

  #value
  summary_countries_trade_value <- countries_trade_summary_slf %>%
    group_by(destination) %>%
    filter(str_detect(infected_state, "pa_") | 
             str_detect(infected_state, "de_") | 
             str_detect(infected_state, "va_") | 
             str_detect(infected_state, "nj_") | 
             str_detect(infected_state, "md_") | 
             str_detect(infected_state, "ct_") | 
             str_detect(infected_state, "ny_") | 
             str_detect(infected_state, "oh_") | 
             str_detect(infected_state, "wv_")
           ) %>%
    filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
    mutate(tempcol = str_split(infected_state, "_")) %>%
    rowwise() %>%
    mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
    dplyr::select(-tempcol) %>%
    group_by(destination, year) %>%
    summarize(avg_infected_trade = sum(trade)) %>%
    group_by(destination) %>%
    mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>%
    summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se))

#FUTURE

#tidying of data and log transforming values
countries_trade_summary_slf_future <- countries_trade_summary_slf_future %>%
  gather(-destination, key = "infected_state", value = "trade") %>%
  as.data.frame(.)

  #value
  summary_countries_trade_value_future <- countries_trade_summary_slf_future %>%
    group_by(destination) %>%
    filter(str_detect(infected_state, "pa_") | 
             str_detect(infected_state, "de_") | 
             str_detect(infected_state, "va_") | 
             str_detect(infected_state, "nj_") | 
             str_detect(infected_state, "md_") | 
             str_detect(infected_state, "nc_") | 
             str_detect(infected_state, "ny_") | 
             str_detect(infected_state, "oh_") | 
             str_detect(infected_state, "wv_") | 
             str_detect(infected_state, "ct_")
           ) %>%    
    filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
    mutate(tempcol = str_split(infected_state, "_")) %>%
    rowwise() %>%
    mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
    dplyr::select(-tempcol) %>%
    group_by(destination, year) %>%
    summarize(avg_infected_trade = sum(trade)) %>%
    group_by(destination) %>%
    mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>%
    summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se))

#MASS
#load countries mass
data("countries_trade_mass_summary_slf")
data("countries_trade_mass_summary_slf_future")

#PRESENT

#tidying of data and converting from kg to metric tons 
countries_trade_mass_summary_slf <- countries_trade_mass_summary_slf %>%
  gather(-destination, key = "infected_state", value = "mass") %>%
  mutate(mass = mass / 1000) %>%
  as.data.frame(.)

  #mass
  summary_countries_trade_mass <- countries_trade_mass_summary_slf %>%
    group_by(destination) %>%
    filter(str_detect(infected_state, "pa_") | 
             str_detect(infected_state, "de_") | 
             str_detect(infected_state, "va_") | 
             str_detect(infected_state, "nj_") | 
             str_detect(infected_state, "md_") | 
             str_detect(infected_state, "ct_") | 
             str_detect(infected_state, "ny_") | 
             str_detect(infected_state, "oh_") | 
             str_detect(infected_state, "wv_")
           ) %>%
    filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
    mutate(tempcol = str_split(infected_state, "_")) %>%
    rowwise() %>%
    mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
    dplyr::select(-tempcol) %>%
    group_by(destination, year) %>%
    summarize(avg_infected_mass = sum(mass)) %>%
    group_by(destination) %>%
    mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>%
    summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))

#FUTURE

#tidying of data and converting from kg to metric tons 
countries_trade_mass_summary_slf_future <- countries_trade_mass_summary_slf_future %>%
  gather(-destination, key = "infected_state", value = "mass") %>%
  mutate(mass = mass / 1000) %>%
  as.data.frame(.)

  #mass
  summary_countries_trade_mass_future <- countries_trade_mass_summary_slf_future %>%
    group_by(destination) %>%
    filter(str_detect(infected_state, "pa_") | 
             str_detect(infected_state, "de_") | 
             str_detect(infected_state, "va_") | 
             str_detect(infected_state, "nj_") | 
             str_detect(infected_state, "md_") | 
             str_detect(infected_state, "nc_") | 
             str_detect(infected_state, "ny_") | 
             str_detect(infected_state, "oh_") | 
             str_detect(infected_state, "wv_") | 
             str_detect(infected_state, "ct_")
           ) %>%    
    filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>%
    mutate(tempcol = str_split(infected_state, "_")) %>%
    rowwise() %>%
    mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>%
    dplyr::select(-tempcol) %>%
    group_by(destination, year) %>%
    summarize(avg_infected_mass = sum(mass)) %>%
    group_by(destination) %>%
    mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>%
    summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))

Here, you can see the summary for the first 10 countries for the future trade data for mass of trade (metric tons), just like the states data.

knitr::kable(head(summary_countries_trade_mass_future, n = 10))

Combined Final Trade Data

Now that we have several permutations of trade with SLF-established states data (both present and future scenarios) for both geopolitical units (states and countries) as measured in value and mass (USD and metric tons, respectively), we can combine some of these data into fewer dataframes to minimize the number of files we are juggling for future analyses. To do this, we merge the value and mass data for each geopolitical unit dataset.

As we do so, we take the time to create $\log_{10}$ transformed versions of the trade value and mass data. To avoid possible errors for this transformation, we add a constant ($value+1$) to data prior to transformation. We use the same transformation process for Impact potential data.

#STATES
summary_states_trade <- left_join(summary_states_trade_value, summary_states_trade_mass, by = c("destination", "status"), suffix = c("_trade", "_mass")) %>%
    mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1))

summary_states_trade_future <- left_join(summary_states_trade_value_future, summary_states_trade_mass_future, by = c("destination", "status"), suffix = c("_trade", "_mass")) %>%
    mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1))

#COUNTRIES

summary_countries_trade <- left_join(summary_countries_trade_value, summary_countries_trade_mass, by = c("destination"), suffix = c("_trade", "_mass")) %>%
    filter(destination != "Western Sahara") %>%
  mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1))

summary_countries_trade_future <- left_join(summary_countries_trade_value_future, summary_countries_trade_mass_future, by = c("destination"), suffix = c("_trade", "_mass"))  %>%
    filter(destination != "Western Sahara") %>%
  mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1))

Here is the merged states data for future:

knitr::kable(head(summary_states_trade_future, n = 10))

And here is the merged countries data for future:

knitr::kable(head(summary_countries_trade_future, n = 10))

Impact

Given the close relationship between cultivation of grapes (grapes) and winemaking (wine), we evaluate impact potential of SLF based on data from both.

Grapes

For grapes, two forms of cultivation data were available:

grape production (metric tons)
grape yield per unit area (metric tons/hectare)

Although we provide both sources of data, we focus more on production as a total measure of output per geopolitical unit.

data("states_grapes")

#summarize and log transform
summary_states_grapes <- states_grapes %>%
  pivot_wider(names_from = Item, values_from = Value, id_cols = c(geopol_unit, Year)) %>%
  group_by(geopol_unit) %>%
  summarize(avg_yield = mean(grape_yield, na.rm = TRUE), 
            se_yield = sd(grape_yield) / sqrt(n_distinct(Year)),
            avg_prod = mean(grape_production, na.rm = T), se_prod = sd(grape_production) / sqrt(n_distinct(Year))
            ) %>%
  drop_na(.) %>%
  mutate(log10_avg_yield = log10(avg_yield+1), log10_avg_prod = log10(avg_prod+1))

data("countries_grapes")

#first rm Western Sahara, then summarize and log transform, while adding in the step of changing NA's to zeros
summary_countries_grapes <- countries_grapes %>%
  filter(geopol_unit != "Western Sahara") %>%
  pivot_wider(names_from = Item, values_from = Value, id_cols = c(geopol_unit, Year)) %>%
  group_by(geopol_unit) %>%
  summarize(avg_yield = mean(grape_yield, na.rm = TRUE), 
            se_yield = sd(grape_yield) / sqrt(n_distinct(Year)),
            avg_prod = mean(grape_production, na.rm = T), se_prod = sd(grape_production) / sqrt(n_distinct(Year))
            ) %>%
  drop_na(.) %>%
  mutate(log10_avg_yield = log10(avg_yield+1), log10_avg_prod = log10(avg_prod+1))

#fix some problematic names of countries
summary_countries_grapes$geopol_unit[grep(pattern = "Ivoire", x = summary_countries_grapes$geopol_unit, value = F)] <- "Ivory Coast"
summary_countries_grapes$geopol_unit[grep(pattern = "Bolivia (Plurinational State of)", x = summary_countries_grapes$geopol_unit, value = F)] <- "Bolivia"
summary_countries_grapes$geopol_unit[grep(pattern = "China, Taiwan Province of", x = summary_countries_grapes$geopol_unit, value = F)] <- "Taiwan"
summary_countries_grapes$geopol_unit[grep(pattern = "Iran (Islamic Republic of)", x = summary_countries_grapes$geopol_unit, value = F)] <- "Iran"
summary_countries_grapes$geopol_unit[grep(pattern = "Venezuela (Bolivarian Republic of)", x = summary_countries_grapes$geopol_unit, value = F)] <- "Venezuela"
summary_countries_grapes$geopol_unit[grep(pattern = "Viet Nam", x = summary_countries_grapes$geopol_unit, value = F)] <- "Vietnam"

Wine

For wine, we were able to obtain wine production for both geopolitical unit datasets in units of mass, metric tons.

data("states_wine")

#summarize and log transform
summary_states_wine <- states_wine %>%
  group_by(geopol_unit) %>%
  summarize(avg_wine = mean(Mass, na.rm = T), se_wine = sd(Mass) / sqrt(n_distinct(Year)), log10_avg_wine = log10(mean(Mass, na.rm = T)+1))

data("countries_wine")

#summarize and log transform, then rm anything that started out as NA or zero
summary_countries_wine <- countries_wine %>%
  group_by(geopol_unit) %>%
  summarize(avg_wine = mean(Mass, na.rm = T), se_wine = sd(Mass) / sqrt(n_distinct(Year)), log10_avg_wine = log10(mean(Mass, na.rm = T)+1)) %>%
  filter(!is.infinite(log10_avg_wine) & !is.nan(log10_avg_wine))

Potentials Summary

Now that all of these data sources have been summarized and tidied, we will merge them together with left_join() sequentially, using the trade data (Transport potential) as the inital dataset that all others are joined to. The resultant datasets contain all permuatations of geopolitical units (states and countries) and infection scenarios (present and future).

During this process, we also use the wine and grape data to produce binary categorical data for each geopolitical unit to indicate whether it is involved directly in that industry. For this, we also manually add the United States as a grape producer in the countries datasets. Additionally, we manually add $status to the countries datasets as follows:

Native: China, India, Taiwan, and Vietnam
Established: Japan, South Korea, and United States

#STATES

##PRESENT

#add the binaries
states_summary_present <- summary_states_extracts %>%
  mutate(grapes = ifelse(geopol_unit %in% summary_states_grapes$geopol_unit, yes = "yes", no = "no")) %>%
  mutate(wine = ifelse(geopol_unit %in% summary_states_wine$geopol_unit, yes = "yes", no = "no")) %>%
  mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no"))) 

#now add in the other data
states_summary_present <- states_summary_present %>%
  left_join(., summary_states_trade, by = c("geopol_unit" = "destination")) %>%   #trade
  left_join(., summary_states_wine, by = "geopol_unit") %>%   #wine
  left_join(., summary_states_grapes, by = "geopol_unit")     #grapes

#deal with NA
states_summary_present[is.na(states_summary_present)] <- 0

#join the new ID's
states_summary_present <- left_join(stateid, states_summary_present, by = "geopol_unit")
#################################
#tidy the order of cols
states_summary_present <- states_summary_present %>%
  #dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything())
  dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything())


##FUTURE

#add the binaries
states_summary_future <- summary_states_extracts %>%
  mutate(grapes = ifelse(geopol_unit %in% summary_states_grapes$geopol_unit, yes = "yes", no = "no")) %>%
  mutate(wine = ifelse(geopol_unit %in% summary_states_wine$geopol_unit, yes = "yes", no = "no")) %>%
  mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no"))) 

#now add in the other data
states_summary_future <- states_summary_future %>%
  left_join(., summary_states_trade_future, by = c("geopol_unit" = "destination")) %>%   #trade
  left_join(., summary_states_wine, by = "geopol_unit") %>%   #wine
  left_join(., summary_states_grapes, by = "geopol_unit")     #grapes

#deal with NA
states_summary_future[is.na(states_summary_future)] <- 0

#join the new ID's
states_summary_future <- left_join(stateid, states_summary_future, by = "geopol_unit")

#tidy the order of cols
states_summary_future <- states_summary_future %>%
    #dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything())
    dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything())


#################################



#COUNTRIES

##PRESENT

#add in the binaries
countries_summary_present <- summary_countries_extracts %>%
  mutate(grapes = ifelse(geopol_unit %in% summary_countries_grapes$geopol_unit, yes = "yes", no = "no")) %>%
  mutate(wine = ifelse(geopol_unit %in% summary_countries_wine$geopol_unit, yes = "yes", no = "no"))

#change U.S. from no to yes for grapes
countries_summary_present$grapes[countries_summary_present$geopol_unit == "United States"] <- "yes"

#The factor fixing step is now here
countries_summary_present <- countries_summary_present %>%
  mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no"))) 

#now drop NAs for the extracts and add in the other data
countries_summary_present <- countries_summary_present %>%
  drop_na(.) %>%
  left_join(., summary_countries_trade, by = c("geopol_unit" = "destination")) %>%   #trade
  left_join(., summary_countries_wine, by = "geopol_unit") %>%   #wine
  left_join(., summary_countries_grapes, by = "geopol_unit")     #grapes

#deal with NA
countries_summary_present[is.na(countries_summary_present)] <- 0

# add ID to make is simpler for labeling points
countries_summary_present <- countries_summary_present %>%  mutate(ID = geopol_unit)

# add a column for status establishment
countries_summary_present <- countries_summary_present %>%  mutate(status = "not established")
countries_summary_present$status[countries_summary_present$geopol_unit %in% c("China", "India", "Taiwan", "Vietnam")] <- "native"
countries_summary_present$status[countries_summary_present$geopol_unit %in% c("Japan", "South Korea", "United States")] <- "established"

#tidy the order of cols
countries_summary_present <- countries_summary_present %>%
    #dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything())
    dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything())

##FUTURE

#add in the binaries
countries_summary_future <- summary_countries_extracts %>%
  mutate(grapes = ifelse(geopol_unit %in% summary_countries_grapes$geopol_unit, yes = "yes", no = "no")) %>%
  mutate(wine = ifelse(geopol_unit %in% summary_countries_wine$geopol_unit, yes = "yes", no = "no"))

# NOTE: must This also includes manually changing the **United States** to a grape producer.
#change U.S. from no to yes for grapes
countries_summary_future$grapes[countries_summary_future$geopol_unit == "United States"] <- "yes"

#The factor fixing step is now here
countries_summary_future <- countries_summary_future %>%
  mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no"))) 

#now drop NAs for the extracts and add in the other data
countries_summary_future <- countries_summary_future %>%
  drop_na(.) %>%
  left_join(., summary_countries_trade_future, by = c("geopol_unit" = "destination")) %>%   #trade
  left_join(., summary_countries_wine, by = "geopol_unit") %>%   #wine
  left_join(., summary_countries_grapes, by = "geopol_unit")     #grapes

#deal with NA
countries_summary_future[is.na(countries_summary_future)] <- 0

# add ID to make is simpler for labeling points
countries_summary_future <- countries_summary_future %>%  mutate(ID = geopol_unit)

# add a column for status establishment
countries_summary_future <- countries_summary_future %>%  mutate(status = "not established")
countries_summary_future$status[countries_summary_future$geopol_unit %in% c("China", "India", "Taiwan", "Vietnam")] <- "native"
countries_summary_future$status[countries_summary_future$geopol_unit %in% c("Japan", "South Korea", "United States")] <- "established"

#tidy the order of cols
countries_summary_future <- countries_summary_future %>%
    #dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything())
    dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything())

Save data

This section provides the code to save each of the final summary objects to the /data directory. They are currently set to not run, as the package should ship with the files already in /data. Note that this chunk can be run by setting the top if from FALSE to TRUE!

if(FALSE){
save(states_summary_present, file = file.path(here(), "data", "states_summary_present.rda"))
save(states_summary_future, file = file.path(here(), "data", "states_summary_future.rda"))
save(countries_summary_present, file = file.path(here(), "data", "countries_summary_present.rda"))
save(countries_summary_future, file = file.path(here(), "data", "countries_summary_future.rda"))
}

if(FALSE){
states_summary_future_ensemble <- states_summary_future
states_summary_present_ensemble <- states_summary_present
countries_summary_future_ensemble <- countries_summary_future
countries_summary_present_ensemble <- countries_summary_present

save(states_summary_future_ensemble, file = file.path(here(), "data", "states_summary_future_ensemble.rda"))
save(states_summary_present_ensemble, file = file.path(here(), "data", "states_summary_present_ensemble.rda"))
save(countries_summary_future_ensemble, file = file.path(here(), "data", "countries_summary_future_ensemble.rda"))
save(countries_summary_present_ensemble, file = file.path(here(), "data", "countries_summary_present_ensemble.rda"))

}

ieco-lab/slfrsk documentation built on Aug. 18, 2022, 10:44 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ieco-lab/slfrsk
Research Compendium for Lycorma delicatula paninvasion study

In ieco-lab/slfrsk: Research Compendium for Lycorma delicatula paninvasion study

Setup

Establishment

Maxent Ensemble Model

Transport

Import Trade Data

U.S. States

Countries

Combined Final Trade Data

Impact

Grapes

Wine

Potentials Summary

Save data

R Package Documentation

Browse R Packages

We want your feedback!

ieco-lab/slfrsk Research Compendium for Lycorma delicatula paninvasion study

In ieco-lab/slfrsk: Research Compendium for Lycorma delicatula paninvasion study

Setup

Establishment

Maxent Ensemble Model

Transport

Import Trade Data

U.S. States

Countries

Combined Final Trade Data

Impact

Grapes

Wine

Potentials Summary

Save data

R Package Documentation

Browse R Packages

We want your feedback!

ieco-lab/slfrsk
Research Compendium for Lycorma delicatula paninvasion study