knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Wrangle and Tidy Package data
All of the following data sources will be read in as raw data from slfrsk/data/
(originally cleaned a bit from raw data versions in slfrsk/data-raw/
) and then cleaned up individually in this vignette (see /data-raw/convert_data_rda.R
for script and commenting that includes earlier unit conversion and transformation of raw data from .csv
to .rda
files used here). Depending on the situation, the resultant "cleaned/tidied" data may be saved to slfrsk/data/
as a new .rda
file. In this vignette, the following data are treated:
Here we load the required packages to prepare the data.
library(slfrsk) #this package for data sources library(tidyverse) #data tidying and cleaning library(here) #making directory pathways easier on different instances
Previously, we built three separate species distribution models (SDMs) with MaxEnt (See species distribution modeling analysis for more information). We also obtained summary statistics for suitability in each geopolitical unit (country and U.S. state) in each SDM (as well as an ensemble SDM), which were transferred to the states_extracts
and countries_extracts
datasets (or states_extracts_ensemble
and countries_extracts_ensemble
for the ensemble SDM).
Here, we explore the ensemble maximum suitability by obtaining the maximum suitability per geopolitical unit for each SDM and calculating the mean and standard error across the models (or just use the extracted maximum value for ensemble models). These summarized suitabilities constitute our predicted Establishment potential.
We also strip Western Sahara from the countries
datasets here and for other datsets, as it is not informative for our analyses (zero trade for Transport potential below).
#new extract of ensemble if(TRUE){ data("states_extracts_ensemble") summary_states_extracts <- states_extracts_ensemble %>% group_by(geopol_unit) %>% summarize(grand_mean_max = mean(obs_max)) data("countries_extracts_ensemble") summary_countries_extracts <- countries_extracts_ensemble %>% group_by(geopol_unit) %>% summarize(grand_mean_max = mean(obs_max)) #rm W. Sahara summary_countries_extracts <- summary_countries_extracts %>% filter(geopol_unit != "Western Sahara") }
#old extracts if(FALSE){ data("states_extracts") summary_states_extracts <- states_extracts %>% group_by(geopol_unit) %>% summarize(grand_mean_max = mean(obs_max), grand_se_max = sd(obs_max) / sqrt(n_distinct(model))) data("countries_extracts") summary_countries_extracts <- countries_extracts %>% group_by(geopol_unit) %>% summarize(grand_mean_max = mean(obs_max), grand_se_max = sd(obs_max) / sqrt(n_distinct(model))) #rm W. Sahara summary_countries_extracts <- summary_countries_extracts %>% filter(geopol_unit != "Western Sahara") }
Now that Establishment data are nicely organized, we can move on to Transport. We estimate Transport potential based on the average annual mass of goods traded (in metric tons) with SLF-established U.S. states over 2012--2017 (we also track value of traded goods in USD). Our existing data can be partitioned into geopolitical units as above (states and countries), but can further be separated into two categories: 1. states with confirmed SLF populations (present
) and 2. states that have a high probability of hosting SLF populations in the near future based on proximity to present
states and records of isolated observations (future
).
Although we maintain several possible permutations of data to analyze, we focus on mass of goods traded for present
scenarios, as these data are unlikely to be skewed by trade of high value goods that are less likely to lead to transportation of egg masses (e.g., electronics) and provide us with a more careful outlook for risk of slf paninvasion.
We first treat the states data by adding two letter codes and SLF-establishment status. For present
, we treat the following states as having established SLF: CT, DE, MD, NJ, NY, OH, PA, VA, and WV. For future
, we add the following states as having established SLF: NC. From here, the data are tidied and gathered before calculating the mean and standard error (SE) of trade with established states for 2012--2017. To do so, we first removed all cases of intrastate trade for established states. Then, we calculated the total trade with established states for each year before obtaining the overall mean and SE.
Below, we demonstrate the code for both present
and future
trade data in value (USD) for U.S. states
.
#load states value data("states_trade_summary_slf") data("states_trade_summary_slf_future") #code snippet to attach infection status and add state abbreviations #state id's for adding to extracts stateid <- c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "DC", "WV", "WI", "WY") stateid <- tibble(stateid, states_trade_summary_slf$destination) colnames(stateid) <- c("ID", "geopol_unit") #PRESENT #add in the infection status of each state while adding the abbreviations states_trade_summary_slf <- left_join(states_trade_summary_slf, stateid, by = c("destination" = "geopol_unit")) %>% mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>% dplyr::select(destination, ID, status, everything()) #tidying of data states_trade_summary_slf <- states_trade_summary_slf %>% gather(-destination:-status, key = "infected_state", value = "trade") %>% as.data.frame(.) #total infected states trade, average/SE summary_states_trade_value <- states_trade_summary_slf %>% group_by(destination) %>% #filter to be just trade with infected states present filter(str_detect(infected_state, "pa_") | str_detect(infected_state, "de_") | str_detect(infected_state, "va_") | str_detect(infected_state, "nj_") | str_detect(infected_state, "md_") | str_detect(infected_state, "ct_") | str_detect(infected_state, "ny_") | str_detect(infected_state, "oh_") | str_detect(infected_state, "wv_") ) %>% #remove the all state trade from consideration filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>% #make a temporary col mutate(tempcol = str_split(infected_state, "_")) %>% #group by row rowwise() %>% #split out the info about states and years of trade mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>% #rm the temp col dplyr::select(-tempcol) %>% #group based now on the destination, year, and infection status #filter out the intrastate trade for infected states to avoid extra low averages by creating a hybrid string (DestinationState+abbreviated_infected state) group_by(destination, year, status) %>% mutate(clear_intra = paste0(destination, source)) %>% filter(!clear_intra %in% c("Connecticutct", "Delawarede", "Marylandmd", "New Jerseynj", "New Yorkny", "Ohiooh", "Pennsylvaniapa", "Virginiava", "West Virginiawv") ) %>% #remove workaround infected intrastate column dplyr::select(-clear_intra) %>% #add up the trade with infected states for each destination state and year (THIS IS NOT AVG INFECTED TRADE YET) summarize(avg_infected_trade = sum(trade)) %>% group_by(destination, status) %>% #obtain the standard error (SE) of infected trade across years (this repeats across all of the entries across years and thus becomes its own value) mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>% #summarize the mean and SE across years! (there is no need to actually take the mean of SE here, but it makes summarizing simplier) summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se)) #FUTURE states_trade_summary_slf_future <- left_join(states_trade_summary_slf_future, stateid, by = c("destination" = "geopol_unit")) %>% mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NC", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>% dplyr::select(destination, ID, status, everything()) #tidying of data states_trade_summary_slf_future <- states_trade_summary_slf_future %>% gather(-destination:-status, key = "infected_state", value = "trade") %>% as.data.frame(.) #total infected states trade, average/SE summary_states_trade_value_future <- states_trade_summary_slf_future %>% group_by(destination) %>% filter(str_detect(infected_state, "pa_") | str_detect(infected_state, "de_") | str_detect(infected_state, "va_") | str_detect(infected_state, "nj_") | str_detect(infected_state, "md_") | str_detect(infected_state, "nc_") | str_detect(infected_state, "ny_") | str_detect(infected_state, "oh_") | str_detect(infected_state, "wv_") | str_detect(infected_state, "ct_") ) %>% filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>% mutate(tempcol = str_split(infected_state, "_")) %>% rowwise() %>% mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>% dplyr::select(-tempcol) %>% group_by(destination, year, status) %>% #filter out the intrastate trade for infected states to avoid extra low averages mutate(clear_intra = paste0(destination, source)) %>% filter(!clear_intra %in% c("Connecticutct", "Delawarede", "Marylandmd", "North Carolinanc", "New Jerseynj", "New Yorkny", "Ohiooh", "Pennsylvaniapa", "Virginiava", "West Virginiawv" ) ) %>% #remove workaround infected intrastate column dplyr::select(-clear_intra) %>% summarize(avg_infected_trade = sum(trade)) %>% group_by(destination, status) %>% mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>% summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se))
We repeat the same process for states
but for the total mass of imported goods (mass
). In doing so, we convert the existing units from kg to metric tons.
#load states mass data("states_trade_mass_summary_slf") data("states_trade_mass_summary_slf_future") #PRESENT #add in the infection status of each state while adding the abbreviations states_trade_mass_summary_slf <- left_join(states_trade_mass_summary_slf, stateid, by = c("destination" = "geopol_unit")) %>% mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>% dplyr::select(destination, ID, status, everything()) #tidying of data and dividing by 1000 to convert from kg to metric tons states_trade_mass_summary_slf <- states_trade_mass_summary_slf %>% gather(-destination:-status, key = "infected_state", value = "mass") %>% mutate(mass = mass / 1000) %>% as.data.frame(.) #mass summary_states_trade_mass <- states_trade_mass_summary_slf %>% group_by(destination) %>% #filter to be just trade with infected states present filter(str_detect(infected_state, "pa_") | str_detect(infected_state, "de_") | str_detect(infected_state, "va_") | str_detect(infected_state, "nj_") | str_detect(infected_state, "md_") | str_detect(infected_state, "ct_") | str_detect(infected_state, "ny_") | str_detect(infected_state, "oh_") | str_detect(infected_state, "wv_") ) %>% filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>% mutate(tempcol = str_split(infected_state, "_")) %>% rowwise() %>% mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>% dplyr::select(-tempcol) %>% group_by(destination, year, status) %>% #filter out the intrastate trade for infected states to avoid extra low averages mutate(clear_intra = paste0(destination, source)) %>% filter(!clear_intra %in% c("Connecticutct", "Delawarede", "Marylandmd", "New Jerseynj", "New Yorkny", "Ohiooh", "Pennsylvaniapa", "Virginiava", "West Virginiawv") ) %>% #remove workaround infected intrastate column dplyr::select(-clear_intra) %>% summarize(avg_infected_mass = sum(mass)) %>% group_by(destination, status) %>% mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>% summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se)) #FUTURE #add col to identify infected states states_trade_mass_summary_slf_future <- left_join(states_trade_mass_summary_slf_future, stateid, by = c("destination" = "geopol_unit")) %>% mutate(status = ifelse(ID %in% c("CT", "DE", "MD", "NC", "NJ", "NY", "OH", "PA", "VA", "WV"), "established", "not established")) %>% dplyr::select(destination, ID, status, everything()) #tidying of data and dividing by 1000 to convert from kg to metric tons states_trade_mass_summary_slf_future <- states_trade_mass_summary_slf_future %>% gather(-destination:-status, key = "infected_state", value = "mass") %>% mutate(mass = mass / 1000) %>% as.data.frame(.) #mass summary_states_trade_mass_future <- states_trade_mass_summary_slf_future %>% group_by(destination) %>% filter(str_detect(infected_state, "pa_") | str_detect(infected_state, "de_") | str_detect(infected_state, "va_") | str_detect(infected_state, "nj_") | str_detect(infected_state, "md_") | str_detect(infected_state, "nc_") | str_detect(infected_state, "ny_") | str_detect(infected_state, "oh_") | str_detect(infected_state, "wv_") | str_detect(infected_state, "ct_") ) %>% filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>% mutate(tempcol = str_split(infected_state, "_")) %>% rowwise() %>% mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>% dplyr::select(-tempcol) %>% group_by(destination, year, status) %>% #filter out the intrastate trade for infected states to avoid extra low averages mutate(clear_intra = paste0(destination, source)) %>% filter(!clear_intra %in% c("Connecticutct", "Delawarede", "Marylandmd", "North Carolinanc", "New Jerseynj", "New Yorkny", "Ohiooh", "Pennsylvaniapa", "Virginiava", "West Virginiawv" ) ) %>% #remove workaround infected intrastate column dplyr::select(-clear_intra) %>% summarize(avg_infected_mass = sum(mass)) %>% group_by(destination, status) %>% mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>% summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))
Here, you can see the summary for the first 10 states for the
future
trade data for mass of trade (metric tons).
knitr::kable(head(summary_states_trade_mass_future, n = 10))
From here, we turn to complete the same analyses for countries
as states (value of trade in USD and then mass of trade in metric tons). Notably, we do not have to worry about the intrastate trade that we filtered out in the states
datasets. As such, we do not bother with attaching a $status
identifier to each country (this is done as needed for visualization later).
#COUNTRIES #VALUE #load countries values data("countries_trade_summary_slf") data("countries_trade_summary_slf_future") #PRESENT #tidying of data and log transforming values countries_trade_summary_slf <- countries_trade_summary_slf %>% gather(-destination, key = "infected_state", value = "trade") %>% as.data.frame(.) #value summary_countries_trade_value <- countries_trade_summary_slf %>% group_by(destination) %>% filter(str_detect(infected_state, "pa_") | str_detect(infected_state, "de_") | str_detect(infected_state, "va_") | str_detect(infected_state, "nj_") | str_detect(infected_state, "md_") | str_detect(infected_state, "ct_") | str_detect(infected_state, "ny_") | str_detect(infected_state, "oh_") | str_detect(infected_state, "wv_") ) %>% filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>% mutate(tempcol = str_split(infected_state, "_")) %>% rowwise() %>% mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>% dplyr::select(-tempcol) %>% group_by(destination, year) %>% summarize(avg_infected_trade = sum(trade)) %>% group_by(destination) %>% mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>% summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se)) #FUTURE #tidying of data and log transforming values countries_trade_summary_slf_future <- countries_trade_summary_slf_future %>% gather(-destination, key = "infected_state", value = "trade") %>% as.data.frame(.) #value summary_countries_trade_value_future <- countries_trade_summary_slf_future %>% group_by(destination) %>% filter(str_detect(infected_state, "pa_") | str_detect(infected_state, "de_") | str_detect(infected_state, "va_") | str_detect(infected_state, "nj_") | str_detect(infected_state, "md_") | str_detect(infected_state, "nc_") | str_detect(infected_state, "ny_") | str_detect(infected_state, "oh_") | str_detect(infected_state, "wv_") | str_detect(infected_state, "ct_") ) %>% filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>% mutate(tempcol = str_split(infected_state, "_")) %>% rowwise() %>% mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>% dplyr::select(-tempcol) %>% group_by(destination, year) %>% summarize(avg_infected_trade = sum(trade)) %>% group_by(destination) %>% mutate(se = (sd(avg_infected_trade, na.rm = T) / sqrt(length(year)))) %>% summarise(avg_infected_trade = mean(avg_infected_trade, na.rm = T), se = mean(se)) #MASS #load countries mass data("countries_trade_mass_summary_slf") data("countries_trade_mass_summary_slf_future") #PRESENT #tidying of data and converting from kg to metric tons countries_trade_mass_summary_slf <- countries_trade_mass_summary_slf %>% gather(-destination, key = "infected_state", value = "mass") %>% mutate(mass = mass / 1000) %>% as.data.frame(.) #mass summary_countries_trade_mass <- countries_trade_mass_summary_slf %>% group_by(destination) %>% filter(str_detect(infected_state, "pa_") | str_detect(infected_state, "de_") | str_detect(infected_state, "va_") | str_detect(infected_state, "nj_") | str_detect(infected_state, "md_") | str_detect(infected_state, "ct_") | str_detect(infected_state, "ny_") | str_detect(infected_state, "oh_") | str_detect(infected_state, "wv_") ) %>% filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>% mutate(tempcol = str_split(infected_state, "_")) %>% rowwise() %>% mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>% dplyr::select(-tempcol) %>% group_by(destination, year) %>% summarize(avg_infected_mass = sum(mass)) %>% group_by(destination) %>% mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>% summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se)) #FUTURE #tidying of data and converting from kg to metric tons countries_trade_mass_summary_slf_future <- countries_trade_mass_summary_slf_future %>% gather(-destination, key = "infected_state", value = "mass") %>% mutate(mass = mass / 1000) %>% as.data.frame(.) #mass summary_countries_trade_mass_future <- countries_trade_mass_summary_slf_future %>% group_by(destination) %>% filter(str_detect(infected_state, "pa_") | str_detect(infected_state, "de_") | str_detect(infected_state, "va_") | str_detect(infected_state, "nj_") | str_detect(infected_state, "md_") | str_detect(infected_state, "nc_") | str_detect(infected_state, "ny_") | str_detect(infected_state, "oh_") | str_detect(infected_state, "wv_") | str_detect(infected_state, "ct_") ) %>% filter(str_detect(infected_state, "201") & !str_detect(infected_state, "all")) %>% mutate(tempcol = str_split(infected_state, "_")) %>% rowwise() %>% mutate(source = unlist(tempcol)[1], year = unlist(tempcol)[2]) %>% dplyr::select(-tempcol) %>% group_by(destination, year) %>% summarize(avg_infected_mass = sum(mass)) %>% group_by(destination) %>% mutate(se = (sd(avg_infected_mass, na.rm = T) / sqrt(length(year)))) %>% summarise(avg_infected_mass = mean(avg_infected_mass, na.rm = T), se = mean(se))
Here, you can see the summary for the first 10 countries for the
future
trade data for mass of trade (metric tons), just like thestates
data.
knitr::kable(head(summary_countries_trade_mass_future, n = 10))
Now that we have several permutations of trade with SLF-established states data (both present
and future
scenarios) for both geopolitical units (states
and countries
) as measured in value and mass (USD and metric tons, respectively), we can combine some of these data into fewer dataframes to minimize the number of files we are juggling for future analyses. To do this, we merge the value and mass data for each geopolitical unit dataset.
As we do so, we take the time to create $\log_{10}$ transformed versions of the trade value and mass data. To avoid possible errors for this transformation, we add a constant ($value+1$) to data prior to transformation. We use the same transformation process for Impact potential data.
#STATES summary_states_trade <- left_join(summary_states_trade_value, summary_states_trade_mass, by = c("destination", "status"), suffix = c("_trade", "_mass")) %>% mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1)) summary_states_trade_future <- left_join(summary_states_trade_value_future, summary_states_trade_mass_future, by = c("destination", "status"), suffix = c("_trade", "_mass")) %>% mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1)) #COUNTRIES summary_countries_trade <- left_join(summary_countries_trade_value, summary_countries_trade_mass, by = c("destination"), suffix = c("_trade", "_mass")) %>% filter(destination != "Western Sahara") %>% mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1)) summary_countries_trade_future <- left_join(summary_countries_trade_value_future, summary_countries_trade_mass_future, by = c("destination"), suffix = c("_trade", "_mass")) %>% filter(destination != "Western Sahara") %>% mutate(log10_avg_infected_trade = log10(avg_infected_trade+1), log10_avg_infected_mass = log10(avg_infected_mass+1))
Here is the merged
states
data forfuture
:
knitr::kable(head(summary_states_trade_future, n = 10))
And here is the merged
countries
data forfuture
:
knitr::kable(head(summary_countries_trade_future, n = 10))
Given the close relationship between cultivation of grapes (grapes
) and winemaking (wine
), we evaluate impact potential of SLF based on data from both.
For grapes
, two forms of cultivation data were available:
Although we provide both sources of data, we focus more on production as a total measure of output per geopolitical unit.
data("states_grapes") #summarize and log transform summary_states_grapes <- states_grapes %>% pivot_wider(names_from = Item, values_from = Value, id_cols = c(geopol_unit, Year)) %>% group_by(geopol_unit) %>% summarize(avg_yield = mean(grape_yield, na.rm = TRUE), se_yield = sd(grape_yield) / sqrt(n_distinct(Year)), avg_prod = mean(grape_production, na.rm = T), se_prod = sd(grape_production) / sqrt(n_distinct(Year)) ) %>% drop_na(.) %>% mutate(log10_avg_yield = log10(avg_yield+1), log10_avg_prod = log10(avg_prod+1)) data("countries_grapes") #first rm Western Sahara, then summarize and log transform, while adding in the step of changing NA's to zeros summary_countries_grapes <- countries_grapes %>% filter(geopol_unit != "Western Sahara") %>% pivot_wider(names_from = Item, values_from = Value, id_cols = c(geopol_unit, Year)) %>% group_by(geopol_unit) %>% summarize(avg_yield = mean(grape_yield, na.rm = TRUE), se_yield = sd(grape_yield) / sqrt(n_distinct(Year)), avg_prod = mean(grape_production, na.rm = T), se_prod = sd(grape_production) / sqrt(n_distinct(Year)) ) %>% drop_na(.) %>% mutate(log10_avg_yield = log10(avg_yield+1), log10_avg_prod = log10(avg_prod+1)) #fix some problematic names of countries summary_countries_grapes$geopol_unit[grep(pattern = "Ivoire", x = summary_countries_grapes$geopol_unit, value = F)] <- "Ivory Coast" summary_countries_grapes$geopol_unit[grep(pattern = "Bolivia (Plurinational State of)", x = summary_countries_grapes$geopol_unit, value = F)] <- "Bolivia" summary_countries_grapes$geopol_unit[grep(pattern = "China, Taiwan Province of", x = summary_countries_grapes$geopol_unit, value = F)] <- "Taiwan" summary_countries_grapes$geopol_unit[grep(pattern = "Iran (Islamic Republic of)", x = summary_countries_grapes$geopol_unit, value = F)] <- "Iran" summary_countries_grapes$geopol_unit[grep(pattern = "Venezuela (Bolivarian Republic of)", x = summary_countries_grapes$geopol_unit, value = F)] <- "Venezuela" summary_countries_grapes$geopol_unit[grep(pattern = "Viet Nam", x = summary_countries_grapes$geopol_unit, value = F)] <- "Vietnam"
For wine
, we were able to obtain wine production for both geopolitical unit datasets in units of mass, metric tons.
data("states_wine") #summarize and log transform summary_states_wine <- states_wine %>% group_by(geopol_unit) %>% summarize(avg_wine = mean(Mass, na.rm = T), se_wine = sd(Mass) / sqrt(n_distinct(Year)), log10_avg_wine = log10(mean(Mass, na.rm = T)+1)) data("countries_wine") #summarize and log transform, then rm anything that started out as NA or zero summary_countries_wine <- countries_wine %>% group_by(geopol_unit) %>% summarize(avg_wine = mean(Mass, na.rm = T), se_wine = sd(Mass) / sqrt(n_distinct(Year)), log10_avg_wine = log10(mean(Mass, na.rm = T)+1)) %>% filter(!is.infinite(log10_avg_wine) & !is.nan(log10_avg_wine))
Now that all of these data sources have been summarized and tidied, we will merge them together with left_join()
sequentially, using the trade
data (Transport potential) as the inital dataset that all others are joined to. The resultant datasets contain all permuatations of geopolitical units (states
and countries
) and infection scenarios (present
and future
).
During this process, we also use the wine
and grape
data to produce binary categorical data for each geopolitical unit to indicate whether it is involved directly in that industry. For this, we also manually add the United States as a grape producer in the countries
datasets. Additionally, we manually add $status
to the countries
datasets as follows:
#STATES ##PRESENT #add the binaries states_summary_present <- summary_states_extracts %>% mutate(grapes = ifelse(geopol_unit %in% summary_states_grapes$geopol_unit, yes = "yes", no = "no")) %>% mutate(wine = ifelse(geopol_unit %in% summary_states_wine$geopol_unit, yes = "yes", no = "no")) %>% mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no"))) #now add in the other data states_summary_present <- states_summary_present %>% left_join(., summary_states_trade, by = c("geopol_unit" = "destination")) %>% #trade left_join(., summary_states_wine, by = "geopol_unit") %>% #wine left_join(., summary_states_grapes, by = "geopol_unit") #grapes #deal with NA states_summary_present[is.na(states_summary_present)] <- 0 #join the new ID's states_summary_present <- left_join(stateid, states_summary_present, by = "geopol_unit") ################################# #tidy the order of cols states_summary_present <- states_summary_present %>% #dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything()) dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything()) ##FUTURE #add the binaries states_summary_future <- summary_states_extracts %>% mutate(grapes = ifelse(geopol_unit %in% summary_states_grapes$geopol_unit, yes = "yes", no = "no")) %>% mutate(wine = ifelse(geopol_unit %in% summary_states_wine$geopol_unit, yes = "yes", no = "no")) %>% mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no"))) #now add in the other data states_summary_future <- states_summary_future %>% left_join(., summary_states_trade_future, by = c("geopol_unit" = "destination")) %>% #trade left_join(., summary_states_wine, by = "geopol_unit") %>% #wine left_join(., summary_states_grapes, by = "geopol_unit") #grapes #deal with NA states_summary_future[is.na(states_summary_future)] <- 0 #join the new ID's states_summary_future <- left_join(stateid, states_summary_future, by = "geopol_unit") #tidy the order of cols states_summary_future <- states_summary_future %>% #dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything()) dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything()) ################################# #COUNTRIES ##PRESENT #add in the binaries countries_summary_present <- summary_countries_extracts %>% mutate(grapes = ifelse(geopol_unit %in% summary_countries_grapes$geopol_unit, yes = "yes", no = "no")) %>% mutate(wine = ifelse(geopol_unit %in% summary_countries_wine$geopol_unit, yes = "yes", no = "no")) #change U.S. from no to yes for grapes countries_summary_present$grapes[countries_summary_present$geopol_unit == "United States"] <- "yes" #The factor fixing step is now here countries_summary_present <- countries_summary_present %>% mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no"))) #now drop NAs for the extracts and add in the other data countries_summary_present <- countries_summary_present %>% drop_na(.) %>% left_join(., summary_countries_trade, by = c("geopol_unit" = "destination")) %>% #trade left_join(., summary_countries_wine, by = "geopol_unit") %>% #wine left_join(., summary_countries_grapes, by = "geopol_unit") #grapes #deal with NA countries_summary_present[is.na(countries_summary_present)] <- 0 # add ID to make is simpler for labeling points countries_summary_present <- countries_summary_present %>% mutate(ID = geopol_unit) # add a column for status establishment countries_summary_present <- countries_summary_present %>% mutate(status = "not established") countries_summary_present$status[countries_summary_present$geopol_unit %in% c("China", "India", "Taiwan", "Vietnam")] <- "native" countries_summary_present$status[countries_summary_present$geopol_unit %in% c("Japan", "South Korea", "United States")] <- "established" #tidy the order of cols countries_summary_present <- countries_summary_present %>% #dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything()) dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything()) ##FUTURE #add in the binaries countries_summary_future <- summary_countries_extracts %>% mutate(grapes = ifelse(geopol_unit %in% summary_countries_grapes$geopol_unit, yes = "yes", no = "no")) %>% mutate(wine = ifelse(geopol_unit %in% summary_countries_wine$geopol_unit, yes = "yes", no = "no")) # NOTE: must This also includes manually changing the **United States** to a grape producer. #change U.S. from no to yes for grapes countries_summary_future$grapes[countries_summary_future$geopol_unit == "United States"] <- "yes" #The factor fixing step is now here countries_summary_future <- countries_summary_future %>% mutate(grapes = factor(grapes, levels = c("yes", "no")), wine = factor(wine, levels = c("yes", "no"))) #now drop NAs for the extracts and add in the other data countries_summary_future <- countries_summary_future %>% drop_na(.) %>% left_join(., summary_countries_trade_future, by = c("geopol_unit" = "destination")) %>% #trade left_join(., summary_countries_wine, by = "geopol_unit") %>% #wine left_join(., summary_countries_grapes, by = "geopol_unit") #grapes #deal with NA countries_summary_future[is.na(countries_summary_future)] <- 0 # add ID to make is simpler for labeling points countries_summary_future <- countries_summary_future %>% mutate(ID = geopol_unit) # add a column for status establishment countries_summary_future <- countries_summary_future %>% mutate(status = "not established") countries_summary_future$status[countries_summary_future$geopol_unit %in% c("China", "India", "Taiwan", "Vietnam")] <- "native" countries_summary_future$status[countries_summary_future$geopol_unit %in% c("Japan", "South Korea", "United States")] <- "established" #tidy the order of cols countries_summary_future <- countries_summary_future %>% #dplyr::select(ID, geopol_unit, status, grand_mean_max, grand_se_max, wine, grapes, avg_wine:log10_avg_prod, everything()) dplyr::select(ID, geopol_unit, status, grand_mean_max, wine, grapes, avg_wine:log10_avg_prod, everything())
This section provides the code to save each of the final summary objects to the /data
directory. They are currently set to not run, as the package should ship with the files already in /data
. Note that this chunk can be run by setting the top if
from FALSE
to TRUE
!
if(FALSE){ save(states_summary_present, file = file.path(here(), "data", "states_summary_present.rda")) save(states_summary_future, file = file.path(here(), "data", "states_summary_future.rda")) save(countries_summary_present, file = file.path(here(), "data", "countries_summary_present.rda")) save(countries_summary_future, file = file.path(here(), "data", "countries_summary_future.rda")) }
if(FALSE){ states_summary_future_ensemble <- states_summary_future states_summary_present_ensemble <- states_summary_present countries_summary_future_ensemble <- countries_summary_future countries_summary_present_ensemble <- countries_summary_present save(states_summary_future_ensemble, file = file.path(here(), "data", "states_summary_future_ensemble.rda")) save(states_summary_present_ensemble, file = file.path(here(), "data", "states_summary_present_ensemble.rda")) save(countries_summary_future_ensemble, file = file.path(here(), "data", "countries_summary_future_ensemble.rda")) save(countries_summary_present_ensemble, file = file.path(here(), "data", "countries_summary_present_ensemble.rda")) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.