library(rds.r)
library(knitr)
library(rmarkdown)
library(ggplot2)
library(scales)
library(gridExtra)

Overview

The tabulate method allows tabulations to be created by specifying the dimensions and measures that are desired. This operation can provide quick and simple analysis directly from the RDS server. This provides users with the ability to create a table or chart once in their code, markdown, or dashboard. If the data or metadata is updated on the server, the updates will automatically flow into the R code without any further adjustment needed.

Getting the Data Product

The first thing we need to do is set up a connection to the RDS server and get the data product we want to use, in this case we will use the SBA Paycheck Protection Loan data.

# provide the api URL to get the server
rds <- get.rds("https://public.richdataservices.com/rds")
catalog <- rds.r::getCatalog(rds, "covid19")
dataProduct <- rds.r::getDataProduct(catalog, "ppp_loan")

Basic Tabulation

We will begin with a basic tabulation and show the count of respondents by province To do this we only need to specify the dimension we want, the default measure is a count.

tabulation <- rds.tabulate(dataProduct, dimensions = "us_state_fips")

ggplot(tabulation@records, aes(x = us_state_fips, y = count, color = us_state_fips)) +
  geom_bar(stat = "identity", fill = "white") + 
  scale_y_continuous(labels = comma) + xlab("Province") + ylab("Count")

Basic Tabulation With Metadata

As you can see in the previous chart the states are returned as their 2 digit numeric code, which is how they are stored in the database. While these may be useful for some other sort of map, in a bar chart they are not helpful. If the RDS server has the metadata set up we can request that it return the code labels for us with the inject parameters.

This also ensures that R users are always able to get the metadata from the source. Because the server is providing the metadata, R users should not need to hard code any code values by hand. In addition, if the metadata ever changes (new codes added, typos fixed, etc) R users don't have to do change anything in their code because the server will provide the updated code values.

Here is the same bar chart with the labels injected.

tabulation <-
  rds.tabulate(dataProduct, dimensions = "us_state_fips", inject = T)

ggplot(tabulation@records, aes(
  x = us_state_fips,
  y = as.numeric(as.character(count)),
  color = us_state_fips
)) +
  geom_bar(stat = "identity", fill = "white") + 
  scale_y_continuous(labels = comma) + xlab("Province") + ylab("Count") +
  theme(axis.text.x = element_text(angle = 35, hjust = 1))

Specifying a Measure

If R users would like to get a different measure then the count, they will need to specify the measure(s) they would like to use. The syntax for a measure is as follows: <name>:<function>(<target_variable>). The name is the name of the variable that will be returned and can be whatever the user wants the computed variable to be called (should be valid SQL-like name), the function is the function to run on a variable in the data product, and the target variable is the name of the variable to perform the function on. If all variables should be used * can be used as the target variable, for example COUNT(*) will provide return the total count of records for the specified dimensions. Currently supported functions are:

In this example we will get the percentage of respondents by state.

tabulation <-
  rds.tabulate(
    dataProduct,
    dimensions = "us_state_fips",
    measures = "percent:PERCENT(*)",
    inject = T
  )

ggplot(tabulation@records, aes(x = us_state_fips, y = percent, color = us_state_fips)) +
  geom_bar(stat = "identity", fill = "white") + 
  scale_y_continuous(labels = function(x) paste0(x, "%")) + xlab("Province") + ylab("Percent of Respondents") +
  theme(axis.text.x = element_text(angle = 35, hjust = 1))

Multiple Dimensions and Measures

Multiple dimensions and measures can be used if desired. In this example we will get a cross tabulation of state and sex, with their percentages returned to us.

tabulation <-
  rds.tabulate(
    dataProduct,
    dimensions = c("us_state_fips", "gender"),
    measures = c("percent:PERCENT(*)"),
    inject = T
  )


# Make the plot
ggplot(tabulation@records, aes(x=us_state_fips, y=percent)) + geom_bar(stat="identity") +
  scale_y_continuous(name = "Percent") + 
  facet_grid(as.factor(tabulation@records$gender)) +
  theme(axis.text.x=element_text(angle=45, hjust = 1))


mtna/rds-r documentation built on July 30, 2023, 3:25 a.m.