library(rds.r)
library(knitr)
library(rmarkdown)

Overview

The select method queries a data product for record level data using the parameters defined in the RDS API. The RDS API will limit any call to 20 records, or 10000 cells by default. This is to ensure that large amounts of data are not being thoughtless passed over the network.

R users would typically be an exception to this rule. The R library is designed to automatically page through the data and build up a complete data set for the R user to work with. The following examples will turn this off (autoPage = FALSE) in order to demonstrate the different fuctions of the select query.

Getting the Data Product

The first thing we need to do is set up a connection to the RDS server and get the data product we want to use.

rds <- get.rds("http://dev.richdataservices.com")
catalog <- rds.r::getCatalog(rds, "uscensus")
dataProduct <- rds.r::getDataProduct(catalog, "census2010-hp")

Basic Select

Getting the Data

Now that we have the data product we can access its data through the select method. The simplest select command will return raw data regardless of if there are any codes associated with the variables.

This will print out the API calls it makes to build up the data set so the user is informed of where in the process the select method is.

# Set autoPage = FALSE for example so we don't have the entire data set.
# We will also limit the columns to 9 and rows to 10 so that the table looks OK in the HTML.
dataSet <-
  rds.r::select(dataProduct,
                colLimit = 9,
                limit = 10,
                autoPage = FALSE)
kable(dataSet@records)
dataSet@variables

Getting the Metadata

In addition to the records, every data set returned by the select method will include any variable metadata that is associated with the data set.

dataSet@variables

Injecting Metadata

In the case that the values of a variable are coded variables and the codes are documented in RDS we can inject these into the returned records with the inject parameter. Notice that the state, region, and division variables have had their numeric codes replaced with the code label.

# Set autoPage = FALSE for example so we don't have the entire data set.
# We will also limit the rows to 5 so that the table looks OK in the HTML.
dataSet <-
  rds.r::select(dataProduct,
                limit = 5,
                autoPage = FALSE,
                inject = TRUE)
paged_table(dataSet@records)

Selecting Columns

RDS allows for a number of different ways to search for variable data. Users can provide the name, a regex, or keyword to search on.

# We will search for SerialNO directly, anything that starts with P (regex)
dataSet <-
  rds.r::select(
    dataProduct,
    cols = "serialno,p.*",
    limit = 5,
    autoPage = FALSE,
    inject = TRUE
  )
kable(dataSet@records)

Users can also get variables by keyword. By prefixing the column name with $ users can specify the keyword to search on. RDS will search variable metadata (names, labels, descriptions, etc) to find variables with the keyword in it and return them

# We will search with the keyword of code
dataSet <-
  rds.r::select(
    dataProduct,
    cols = "$code",
    limit = 5,
    autoPage = FALSE,
    inject = TRUE
  )
kable(dataSet@records)


mtna/rds-r documentation built on July 30, 2023, 3:25 a.m.