README.md

rds.r

Overview

This R library facilitates access and integration with MTNA Rich Data Services API, enabling immediate access to RDS backed datasets and metadata collections. RDS provides on-premises or cloud based solution for concurrently accessing, querying, tabulating, and packaging data and metadata through a flexible REST API.

Why RDS?

Retrieving data and metadata for analytical, reporting, or visualization purposes is typically a time and resource consuming process that involves several steps such as: - Locating and downloading the data - Converting and load into R - Computing subsets or aggregation - Finding relevant documentation - Manually transcribing codes/classification/labels and other descriptive elements into R objects

RDS completely simplifies this process by offeing a powerful REST API to perform all of the above in a single step! No need to download data, convert across formats, spend hours skimming though cryptic PDF/Excel/Word and other legacy files for documentation.

RDS combines on the fly querying and tabulation capabilities with metadata retrieval features. Comprehensive variable and classification metadata can accompany any data queried through RDS, enabling immediate reuse and rendering.

Visit the RDS web site for detailed informtion on the platform capabilities or learn more about how to complement and deliver your data to you users.

RDS in Action

IMAGE ALT TEXT HERE

Install

# We are working on getting into the CRAN repository, for now local installs are necessary
setwd("rds.r")
install("rds.r")

In the examples below, we will be using data from the 1948 American National Election Study 1948 (ANES) dataset hosted on the MTNA's public RDS server. This simple dataset contains 662 records and 65 variables/columns.

Querying Data (select)

Background

Lets imagine that we are researching the United States presidential election from 1948, between Harry Truman, and Thomas Dewey. We are interested in demographic data of the respondent and why they did or did not vote for Harry Truman. We may take a look at the documentation on the ANES website which has a lot of good information about the variables and their codes stored in an ASCII format.

However, to identify the variables of interest we would need to read through all 67 variables and their names and labels and choose the variables that have to do with respondent demographics or questions about Truman.

Calling RDS

RDS will save you the trouble, instead of manually reading the variables information, we can request the variables and their metadata be returned to us by searching for keywords. This will return more than data, the variable and classification metadata will be available as well. This will allow us to document the variables we are using to provide ourselves and others with more context around the data we are using.

# For the purposes of this example we will use the 'select' function with autoPaging turend off to
# ensure the table does not get too big for the HTML, for analysis we would use
# the 'select' function to return the entire data set.  data <-
# select('http://richdataservices.com/public/api/catalog/','test','anes1948',cols=$truman,$respondent')

# We will limit the variable properties returned to make our data dictionary more
# visually appealing
varProperties <- "id,label,question,storageType,width,classification"

dataSet <- select("http://richdataservices.com/public/api/catalog/", "test", 
    "anes1948", cols = "$truman,$respondent", limit = 10, varProperties = varProperties, autoPage = FALSE)

# Variable information
metadata <- dataSet@metadata
variableDocs <- rds:::variables(metadata)
varTable <- sjPlot::sjt.df(variableDocs, useViewer = F, describe = FALSE, encoding = "UTF-8", 
    no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr

# Data
data <- dataSet@data
dataTable <- sjPlot::sjt.df(data, useViewer = F, describe = FALSE, encoding = "UTF-8", 
    no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr

Dictionary

id label question storageType width classification V480014a WHY PPL VTD FOR TRUMAN 1 Q. 5. WHY DO YOU THINK PEOPLE VOTED FOR TRUMAN. Q. 5A. ARE THERE ANY OTHER KINDS OF REASONS WHY YOU THINK PEOPLE VOTED FOR TRUMAN. NUMERIC 2 V480014A V480014b WHY PPL VTD FOR TRUMAN 2 Q. 5. WHY DO YOU THINK PEOPLE VOTED FOR TRUMAN. Q. 5A. ARE THERE ANY OTHER KINDS OF REASONS WHY YOU THINK PEOPLE VOTED FOR TRUMAN. NUMERIC 2 V480014B V480015a WHY PPL VTD AGNST TRUMAN 1 Q. 6. DO YOU THINK THERE WAS ANYTHING SPECIAL ABOUT TRUMAN THAT MADE SOME PEOPLE VOTE AGAINST HIM. NUMERIC 2 V480015A V480015b WHY PPL VTD AGNST TRUMAN 2 Q. 6. DO YOU THINK THERE WAS ANYTHING SPECIAL ABOUT TRUMAN THAT MADE SOME PEOPLE VOTE AGAINST HIM. NUMERIC 2 V480015B V480031a GRPS IDENTIFIED W TRUMAN 1 1. GROUPS IDENTIFIED WITH TRUMAN NUMERIC 2 V480031A V480031b GRPS IDENTIFIED W TRUMAN 2 1. GROUPS IDENTIFIED WITH TRUMAN NUMERIC 2 V480031B V480031c GRPS IDENTIFIED W TRUMAN 3 1. GROUPS IDENTIFIED WITH TRUMAN NUMERIC 2 V480031C V480033a ISSUES CONNECTED W TRMN 1 1. ISSUES MENTIONED IN CONNECTION WITH TRUMAN NUMERIC 2 V480033A V480033b ISSUES CONNECTED W TRMN 2 1. ISSUES MENTIONED IN CONNECTION WITH TRUMAN NUMERIC 2 V480033B V480035a PERSONAL ATTRIBUTE TRMN 1 1. PERSONAL ATTRIBUTES OF TRUMAN NUMERIC 2 V480035A V480035b PERSONAL ATTRIBUTE TRMN 2 1. PERSONAL ATTRIBUTES OF TRUMAN NUMERIC 2 V480035B V480006 R REMEMBER PREVIOUS INT INTERVIEWER -- DID THE RESPONDENT REMEMBER BEING INTERVIEWED PREVIOUSLY. NUMERIC 1 V480006 V480007 INTR INTERVIEW THIS R INTERVIEWER -- DID YOU INTERVIEW THIS RESPONDENT. NUMERIC 1 V480007 V480008 PRVS PRE-ELCTN R REINT INTERVIEWER -- WAS THE PREVIOUS PRE-ELECTION RESPONDENT RE-INTERVIEWED. NUMERIC 1 V480008 V480045 SEX OF RESPONDENT Q. 1. SEX OF RESPONDENT NUMERIC 1 V480045 V480046 RACE OF RESPONDENT Q. 2. RACE OF RESPONDENT NUMERIC 1 V480046 V480047 AGE OF RESPONDENT Q. 3. AGE OF RESPONDENT NUMERIC 1 V480047 V480048 EDUCATION OF RESPONDENT Q. 4. EDUCATION OF RESPONDENT NUMERIC 1 V480048

Data

V480014a V480014b V480015a V480015b V480031a V480031b V480031c V480033a V480033b V480035a V480035b V480006 V480007 V480008 V480045 V480046 V480047 V480048 30 91 98 91 10 0 0 0 0 0 0 1 9 9 1 1 3 1 30 50 30 91 13 11 0 83 0 22 0 1 2 1 2 1 3 2 10 30 30 91 10 0 0 0 0 23 0 1 1 1 2 1 2 2 30 91 10 91 11 0 0 0 0 22 0 1 1 1 2 1 3 3 30 60 10 91 11 12 0 19 83 22 0 1 2 1 1 1 2 3 30 91 99 91 12 11 0 68 0 0 0 1 1 1 2 1 3 2 98 91 90 91 0 0 0 0 0 23 0 1 1 1 1 1 4 1 50 90 90 91 10 12 0 0 0 0 0 1 1 1 2 1 5 1 50 30 30 90 10 0 0 0 0 23 0 1 1 1 1 1 2 2 30 90 90 91 0 0 0 0 0 12 0 1 1 1 1 1 2 2

Data With Meaning

Maybe we want to know the code values for the codes of one or more of these variables, we have two options to do this.

First we could inject the code values into the returned data set using the inject parameter.

dataSet <- select("http://richdataservices.com/public/api/catalog/", "test", 
    "anes1948", cols = "$truman,$respondent", limit = 10, inject = TRUE, autoPage=FALSE)
data <- dataSet@data
dataTable <- sjPlot::sjt.df(data, useViewer = F, describe = FALSE, encoding = "UTF-8", 
    no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr
V480014a V480014b V480015a V480015b V480031a V480031b V480031c V480033a V480033b V480035a V480035b V480006 V480007 V480008 V480045 V480046 V480047 V480048 TRUMAN PRO-LABOR, NEGRO, WORKING MAN NO SECOND REASON DK NO SECOND REASON COMMON MAN, LITTLE PEOPLE NO GROUP, OR NO SECOND, THIRD GROUP NO GROUP, OR NO SECOND, THIRD GROUP NO ISSUE NO ISSUE NO ATTRIBUTE MENTIONED NO ATTRIBUTE MENTIONED YES NA NA MALE WHITE 35-44 GRADE SCHOOL TRUMAN PRO-LABOR, NEGRO, WORKING MAN DEMOCRATS MEAN PROSPERITY, REPUBLICANS CAN'T GET THINGS DONE NO SECOND REASON GOVERNMENT WORKER ORGANIZED LABOR, UNIONS NO GROUP, OR NO SECOND, THIRD GROUP TAFT-HARTLEY NO ISSUE SMALL, INCOMPETENT, INEFFICIENT NO ATTRIBUTE MENTIONED YES NO PREVIOUS RESPONDENT INTERVIEWED FEMALE WHITE 35-44 HIGH SCHOOL BETTER MAN TRUMAN PRO-LABOR, NEGRO, WORKING MAN CAN'T GET THINGS DONE NO SECOND REASON COMMON MAN, LITTLE PEOPLE NO GROUP, OR NO SECOND, THIRD GROUP NO GROUP, OR NO SECOND, THIRD GROUP NO ISSUE NO ISSUE INDECISIVE, VACILLATING, DOESN'T KNOW H NO ATTRIBUTE MENTIONED YES YES PREVIOUS RESPONDENT INTERVIEWED FEMALE WHITE 25-34 HIGH SCHOOL TRUMAN PRO-LABOR, NEGRO, WORKING MAN NO SECOND REASON SMALL MAN, INADEQUATE BACKGROUND NO SECOND REASON ORGANIZED LABOR, UNIONS NO GROUP, OR NO SECOND, THIRD GROUP NO GROUP, OR NO SECOND, THIRD GROUP NO ISSUE NO ISSUE SMALL, INCOMPETENT, INEFFICIENT NO ATTRIBUTE MENTIONED YES YES PREVIOUS RESPONDENT INTERVIEWED FEMALE WHITE 35-44 COLLEGE TRUMAN PRO-LABOR, NEGRO, WORKING MAN TRUMAN PRO RENT CONTROL, PRICE CONTROL, SMALL MAN, INADEQUATE BACKGROUND NO SECOND REASON ORGANIZED LABOR, UNIONS FARMER NO GROUP, OR NO SECOND, THIRD GROUP FARM PRICES AND SUPPORT TAFT-HARTLEY SMALL, INCOMPETENT, INEFFICIENT NO ATTRIBUTE MENTIONED YES NO PREVIOUS RESPONDENT INTERVIEWED MALE WHITE 25-34 COLLEGE TRUMAN PRO-LABOR, NEGRO, WORKING MAN NO SECOND REASON NA NO SECOND REASON FARMER ORGANIZED LABOR, UNIONS NO GROUP, OR NO SECOND, THIRD GROUP TAFT-HARTLEY NO ISSUE NO ATTRIBUTE MENTIONED NO ATTRIBUTE MENTIONED YES YES PREVIOUS RESPONDENT INTERVIEWED FEMALE WHITE 35-44 HIGH SCHOOL DK NO SECOND REASON OTHER REASONS NO SECOND REASON NO GROUP, OR NO SECOND, THIRD GROUP NO GROUP, OR NO SECOND, THIRD GROUP NO GROUP, OR NO SECOND, THIRD GROUP NO ISSUE NO ISSUE INDECISIVE, VACILLATING, DOESN'T KNOW H NO ATTRIBUTE MENTIONED YES YES PREVIOUS RESPONDENT INTERVIEWED MALE WHITE 45-54 GRADE SCHOOL DEMOCRATS MEAN PROSPERITY, REPUBLICANS OTHER OTHER REASONS NO SECOND REASON COMMON MAN, LITTLE PEOPLE FARMER NO GROUP, OR NO SECOND, THIRD GROUP NO ISSUE NO ISSUE NO ATTRIBUTE MENTIONED NO ATTRIBUTE MENTIONED YES YES PREVIOUS RESPONDENT INTERVIEWED FEMALE WHITE 55-64 GRADE SCHOOL DEMOCRATS MEAN PROSPERITY, REPUBLICANS TRUMAN PRO-LABOR, NEGRO, WORKING MAN CAN'T GET THINGS DONE OTHER REASONS COMMON MAN, LITTLE PEOPLE NO GROUP, OR NO SECOND, THIRD GROUP NO GROUP, OR NO SECOND, THIRD GROUP NO ISSUE NO ISSUE INDECISIVE, VACILLATING, DOESN'T KNOW H NO ATTRIBUTE MENTIONED YES YES PREVIOUS RESPONDENT INTERVIEWED MALE WHITE 25-34 HIGH SCHOOL TRUMAN PRO-LABOR, NEGRO, WORKING MAN OTHER OTHER REASONS NO SECOND REASON NO GROUP, OR NO SECOND, THIRD GROUP NO GROUP, OR NO SECOND, THIRD GROUP NO GROUP, OR NO SECOND, THIRD GROUP NO ISSUE NO ISSUE EXPERIENCED, CAPABLE, COMPETENT, INTELL NO ATTRIBUTE MENTIONED YES YES PREVIOUS RESPONDENT INTERVIEWED MALE WHITE 25-34 HIGH SCHOOL

Sex of Respondent: Classification

We could also simply access the classification information for a given variable.

# Variable information
metadata <- dataSet@metadata
V480045 <- rds:::variable(metadata, "V480045")
classification <- rds:::classification(metadata, V480045$classification)
classTable <- sjPlot::sjt.df(classification@codes, useViewer = F, describe = FALSE, 
    encoding = "UTF-8", no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr
value label 1 MALE 2 FEMALE 9 NA

Tabulating Data

Perhaps we would like to know if there is any difference between why male and female respondents think people voted for Truman. First lets create the table using the tabulate function.

tabulation <- tabulate("http://richdataservices.com/public/api/catalog/", "test", 
    "anes1948", dimensions = "V480045,V480014a", inject = TRUE)
data <- tabulation@data
table <- sjPlot::sjt.df(data, useViewer = F, describe = FALSE, encoding = "UTF-8", 
    no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr
V480045 V480014a count MALE BETTER MAN 14 MALE EXPERIENCED, GOOD RECORD 22 MALE TRUMAN PRO-LABOR, NEGRO, WORKING MAN 94 MALE GOOD CAMPAIGN CONDUCTED BY TRUMAN 20 MALE DEMOCRATS MEAN PROSPERITY, REPUBLICANS 41 MALE TRUMAN PRO RENT CONTROL, PRICE CONTROL, 22 MALE ROOSEVELT TRADITION 4 MALE PERSONAL ATTRIBUTES 5 MALE OTHER 61 MALE DK 13 MALE NA 6 FEMALE BETTER MAN 35 FEMALE EXPERIENCED, GOOD RECORD 34 FEMALE TRUMAN PRO-LABOR, NEGRO, WORKING MAN 88 FEMALE GOOD CAMPAIGN CONDUCTED BY TRUMAN 19 FEMALE DEMOCRATS MEAN PROSPERITY, REPUBLICANS 39 FEMALE TRUMAN PRO RENT CONTROL, PRICE CONTROL, 17 FEMALE ROOSEVELT TRADITION 10 FEMALE PERSONAL ATTRIBUTES 15 FEMALE OTHER 58

Turning Data into Charts (Visualizing)

Because the tabulate function returns a data set that contains the data and metadata we can create a chart with minimal effort. Simply plug the metadata and data into the appropriate places and do any other formatting that is necessary displaying the chart.

# get the metadata from the previously returned dataSet which applies to both the
# male and female data
metadata <- tabulation@metadata
V480045 <- rds:::variable(metadata, "V480045")
V480014a <- rds:::variable(metadata, "V480014a")
classification <- rds:::classification(metadata, V480045$classification)

## we will compute the percentage of that male and female 
## responses for each category
data = ddply(data, .(V480014a), transform, percent = count/sum(count) * 100)
data = ddply(data, .(V480014a), transform, pos = (cumsum(count) - 0.5 * count))
data$label = paste0(sprintf("%.0f", data$percent), "%")

# plot the data
ggplot(data, aes(x = factor(V480014a), y = count, fill = V480045)) + geom_bar(stat = "identity") + 
    geom_text(aes(y = pos, label = label), size = 3) + theme(axis.text.x = element_text(angle = 90, 
    hjust = 1, vjust = 0.5), axis.text.y = element_text(size = 12), legend.position = "top") + 
    xlab(V480014a$label) + coord_flip()

Contribute

Putting this product together and maintaining the repository takes time and resources. We welcome your support in any shape or form, in particular:

License

This work is licensed under the BSD-3 License. See LICENSE file for details.



mtna/rrds documentation built on May 23, 2019, 8:19 a.m.