knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This page provides an overview for the get_troopdata() function, highlighting some of its potential uses.

First things first—let's load the {troopdata} package

library(troopdata)
library(ggplot2)
library(viridis)

The troopdata package provides multiple functions to generate customizable datasets containing information on US military deployments and accompanying data. The get_troopdata() function represents the core of this package, providing customized data on US overseas troop deployments, specifically.

Country-year data

The first function of this package is the get_troopdata() function. At its most basic this function returns a data frame of country-year troop deployment values for the selected time period, using the startdate and enddate parameters.

## basic example code
library(troopdata)
# This example uses the basic function to gather total troop figures for all countries from 1990 to 2020
example <- get_troopdata(host = "USA", startyear = 1990, endyear = 2020)

head(example)

For users who want more refined data, the there are a number of arguments that allow the user to further tailor the output to their needs.

The host argument allows users to specify the set of host countries for which they would like data returned. This can be a vector of numerical values equal to a Correlates of War (COW) Project country code, a vector of character values equal to an ISO3C country code, or a vector of character values corresponding to full country names. Note that when supplying a vector of values they must be consistent and correspond to a single type of identifier at a time (i.e. they must all be numeric COW codes, ISO3C character codes, country names, or region names).

For example, you can use a numeric vector of COW country codes like this:

# Let's make the host selection more specific
hostlist <- c(200, 220)

example <- get_troopdata(host = hostlist, startyear = 1990, endyear = 2020)

head(example)

Or you can use a character vector of ISO3C codes.

hostlist.char <- c("CAN", "GBR")

example.char <- get_troopdata(host = hostlist.char, startyear = 1970, endyear = 2020)

head(example.char)

Similarly, we can search for full country names:

hostlist.names <- c("Canada", "United Kingdom")

example.names <- get_troopdata(host = hostlist.names, startyear = 1970, endyear = 2020)

head(example.names)

When searching for country names, the function will do its best to identify the correct country based on the character string that's included. This can include cases where fragments of country names are included and the function will try to return the correct country.

example.frag <- get_troopdata(host = "South Ko", startyear = 1970, endyear = 2020)

head(example.frag)

Finally, we can also search by region. Instead of inserting a country name or code into the host argument you can simply include character strings that represent regions. In these cases the function returns the aggregate sum of all deployments within that region for the specified time period

region.list <- c("Europe", "Asia")

example.region <- get_troopdata(host = region.list, startyear = 1970, endyear = 2020)

head(example.region)

Disaggregated Data

By default the get_troopdata() function returns the aggregate sum of active duty military personnel. But the original DMDC reports often include disaggregated figures, with separate counts for each branch of the military. The branch argument allows users to specify whether they would like to receive the aggregate sum of all branches or the disaggregated figures for each branch. This argument can take on three values: TRUE, FALSE, or a vector of branch names.

# Let's get the disaggregated data for the US deployments to Canada and the UK
hostlist <- c("Canada", "United Kingdom")

example.branch <- get_troopdata(host = hostlist, branch = TRUE, startyear = 1970, endyear = 2020)

head(example.branch)

In each case the _ad suffix on the variable name indicates "Active Duty" numbers for the given branch.

Note that the total does not necessarily equal the sum of the individual branches. The function returns the maximum annual value for each branch. In cases where there are quarterly values reported, the sum total may come from one quarter and the individual branch values may come from another quarter.

We can also include disaggregated data national guard and reserve personnel, as well as DoD civilians. These numbers are generally only reported for more recent years, and are not available for all countries and time periods. Later updates will include more observations for earlier time periods where they are available.

hostlist <- c("Canada", "United Kingdom")

example.branch <- get_troopdata(host = hostlist, branch = TRUE, startyear = 1970, endyear = 2020, guard_reserve = TRUE, civilians = TRUE)

head(example.branch)

Time Periods

The most recent update also allows users to specify more fine grained temporal coverage. DMDC reports have historically been released on an annual basis, but in more recent years they have been released twice annually or even quarterly, and the get_troopdata() function allows users to specify whether they would like to receive the quarterly data or the annual data. The quarters argument allows users to specify whether they would like to receive the quarterly data or the annual data. This argument can take on two values: TRUE or FALSE with the default being FALSE.

If the user opts to return quarterly data, the function will return the month and quarter columns in addition to the year. Note that not every quarter corresponds to a quarterly report for every country. In some cases the quarterly value may be a 0 rather than NA. Additionally, the Army has not reported branch data for several recent quarters between 2022 and 2023 due to internal personnel management changes. Accordingly the aggregate totals for these quarters may be lower than expected or not available.

Here we use full country names. See! Neat!

# Let's get the quarterly data for the US deployments to Canada and the UK
hostlist <- c("Canada", "United Kingdom")

example.quarters <- get_troopdata(host = hostlist, branch = TRUE, startyear = 2015, endyear = 2022, quarters = TRUE)

head(example.quarters)

Reports

Finally, users may want to view the original DMDC reports that the data is drawn from. The reports argument allows users to specify whether they would like to receive the original DMDC reports that the data is drawn from. This argument can take on two values: TRUE or FALSE with the default being FALSE.

Users can specify the host, startyear, and endyear arguments as they would for the main function. The function will return data frame single data frame containing all of the original columns found in the DMDC reports upon which the data are based. The formatting and column names will be roughly consistent with the original reports, but the data will be filtered to only include the specified host countries and time period.

The source column in the data frame provides the month and the year that the data was drawn from. This allows the user to more easily track down the original DMDC report that the data was drawn from. The current and archived reports can be found here: DMDC Reports

Also note that if reports is set to TRUE then the user must also set the quarters argument to TRUE.



meflynn/troopdata documentation built on Sept. 16, 2024, 8:34 a.m.