This R library facilitates access and integration with MTNA Rich Data Services API, enabling immediate access to RDS backed datasets and metadata collections. RDS provides on-premises or cloud based solution for concurrently accessing, querying, tabulating, and packaging data and metadata through a flexible REST API.
Retrieving data and metadata for analytical, reporting, or visualization purposes is typically a time and resource consuming process that involves several steps such as: - Locating and downloading the data - Converting and load into R - Computing subsets or aggregation - Finding relevant documentation - Manually transcribing codes/classification/labels and other descriptive elements into R objects
RDS completely simplifies this process by offeing a powerful REST API to perform all of the above in a single step! No need to download data, convert across formats, spend hours skimming though cryptic PDF/Excel/Word and other legacy files for documentation.
RDS combines on the fly querying and tabulation capabilities with metadata retrieval features. Comprehensive variable and classification metadata can accompany any data queried through RDS, enabling immediate reuse and rendering.
Visit the RDS web site for detailed informtion on the platform capabilities or learn more about how to complement and deliver your data to you users.
# We are working on getting into the CRAN repository, for now local installs are necessary
setwd("rds.r")
install("rds.r")
In the examples below, we will be using data from the 1948 American National Election Study 1948 (ANES) dataset hosted on the MTNA's public RDS server. This simple dataset contains 662 records and 65 variables/columns.
Lets imagine that we are researching the United States presidential election from 1948, between Harry Truman, and Thomas Dewey. We are interested in demographic data of the respondent and why they did or did not vote for Harry Truman. We may take a look at the documentation on the ANES website which has a lot of good information about the variables and their codes stored in an ASCII format.
However, to identify the variables of interest we would need to read through all 67 variables and their names and labels and choose the variables that have to do with respondent demographics or questions about Truman.
RDS will save you the trouble, instead of manually reading the variables information, we can request the variables and their metadata be returned to us by searching for keywords. This will return more than data, the variable and classification metadata will be available as well. This will allow us to document the variables we are using to provide ourselves and others with more context around the data we are using.
# For the purposes of this example we will use the 'select' function with autoPaging turend off to
# ensure the table does not get too big for the HTML, for analysis we would use
# the 'select' function to return the entire data set. data <-
# select('http://richdataservices.com/public/api/catalog/','test','anes1948',cols=$truman,$respondent')
# We will limit the variable properties returned to make our data dictionary more
# visually appealing
varProperties <- "id,label,question,storageType,width,classification"
dataSet <- select("http://richdataservices.com/public/api/catalog/", "test",
"anes1948", cols = "$truman,$respondent", limit = 10, varProperties = varProperties, autoPage = FALSE)
# Variable information
metadata <- dataSet@metadata
variableDocs <- rds:::variables(metadata)
varTable <- sjPlot::sjt.df(variableDocs, useViewer = F, describe = FALSE, encoding = "UTF-8",
no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr
# Data
data <- dataSet@data
dataTable <- sjPlot::sjt.df(data, useViewer = F, describe = FALSE, encoding = "UTF-8",
no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr
Maybe we want to know the code values for the codes of one or more of these variables, we have two options to do this.
First we could inject the code values into the returned data set using the inject parameter.
dataSet <- select("http://richdataservices.com/public/api/catalog/", "test",
"anes1948", cols = "$truman,$respondent", limit = 10, inject = TRUE, autoPage=FALSE)
data <- dataSet@data
dataTable <- sjPlot::sjt.df(data, useViewer = F, describe = FALSE, encoding = "UTF-8",
no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr
V480014a
V480014b
V480015a
V480015b
V480031a
V480031b
V480031c
V480033a
V480033b
V480035a
V480035b
V480006
V480007
V480008
V480045
V480046
V480047
V480048
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
NO SECOND REASON
DK
NO SECOND REASON
COMMON MAN, LITTLE PEOPLE
NO GROUP, OR NO SECOND, THIRD GROUP
NO GROUP, OR NO SECOND, THIRD GROUP
NO ISSUE
NO ISSUE
NO ATTRIBUTE MENTIONED
NO ATTRIBUTE MENTIONED
YES
NA
NA
MALE
WHITE
35-44
GRADE SCHOOL
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
DEMOCRATS MEAN PROSPERITY, REPUBLICANS
CAN'T GET THINGS DONE
NO SECOND REASON
GOVERNMENT WORKER
ORGANIZED LABOR, UNIONS
NO GROUP, OR NO SECOND, THIRD GROUP
TAFT-HARTLEY
NO ISSUE
SMALL, INCOMPETENT, INEFFICIENT
NO ATTRIBUTE MENTIONED
YES
NO
PREVIOUS RESPONDENT INTERVIEWED
FEMALE
WHITE
35-44
HIGH SCHOOL
BETTER MAN
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
CAN'T GET THINGS DONE
NO SECOND REASON
COMMON MAN, LITTLE PEOPLE
NO GROUP, OR NO SECOND, THIRD GROUP
NO GROUP, OR NO SECOND, THIRD GROUP
NO ISSUE
NO ISSUE
INDECISIVE, VACILLATING, DOESN'T KNOW H
NO ATTRIBUTE MENTIONED
YES
YES
PREVIOUS RESPONDENT INTERVIEWED
FEMALE
WHITE
25-34
HIGH SCHOOL
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
NO SECOND REASON
SMALL MAN, INADEQUATE BACKGROUND
NO SECOND REASON
ORGANIZED LABOR, UNIONS
NO GROUP, OR NO SECOND, THIRD GROUP
NO GROUP, OR NO SECOND, THIRD GROUP
NO ISSUE
NO ISSUE
SMALL, INCOMPETENT, INEFFICIENT
NO ATTRIBUTE MENTIONED
YES
YES
PREVIOUS RESPONDENT INTERVIEWED
FEMALE
WHITE
35-44
COLLEGE
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
TRUMAN PRO RENT CONTROL, PRICE CONTROL,
SMALL MAN, INADEQUATE BACKGROUND
NO SECOND REASON
ORGANIZED LABOR, UNIONS
FARMER
NO GROUP, OR NO SECOND, THIRD GROUP
FARM PRICES AND SUPPORT
TAFT-HARTLEY
SMALL, INCOMPETENT, INEFFICIENT
NO ATTRIBUTE MENTIONED
YES
NO
PREVIOUS RESPONDENT INTERVIEWED
MALE
WHITE
25-34
COLLEGE
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
NO SECOND REASON
NA
NO SECOND REASON
FARMER
ORGANIZED LABOR, UNIONS
NO GROUP, OR NO SECOND, THIRD GROUP
TAFT-HARTLEY
NO ISSUE
NO ATTRIBUTE MENTIONED
NO ATTRIBUTE MENTIONED
YES
YES
PREVIOUS RESPONDENT INTERVIEWED
FEMALE
WHITE
35-44
HIGH SCHOOL
DK
NO SECOND REASON
OTHER REASONS
NO SECOND REASON
NO GROUP, OR NO SECOND, THIRD GROUP
NO GROUP, OR NO SECOND, THIRD GROUP
NO GROUP, OR NO SECOND, THIRD GROUP
NO ISSUE
NO ISSUE
INDECISIVE, VACILLATING, DOESN'T KNOW H
NO ATTRIBUTE MENTIONED
YES
YES
PREVIOUS RESPONDENT INTERVIEWED
MALE
WHITE
45-54
GRADE SCHOOL
DEMOCRATS MEAN PROSPERITY, REPUBLICANS
OTHER
OTHER REASONS
NO SECOND REASON
COMMON MAN, LITTLE PEOPLE
FARMER
NO GROUP, OR NO SECOND, THIRD GROUP
NO ISSUE
NO ISSUE
NO ATTRIBUTE MENTIONED
NO ATTRIBUTE MENTIONED
YES
YES
PREVIOUS RESPONDENT INTERVIEWED
FEMALE
WHITE
55-64
GRADE SCHOOL
DEMOCRATS MEAN PROSPERITY, REPUBLICANS
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
CAN'T GET THINGS DONE
OTHER REASONS
COMMON MAN, LITTLE PEOPLE
NO GROUP, OR NO SECOND, THIRD GROUP
NO GROUP, OR NO SECOND, THIRD GROUP
NO ISSUE
NO ISSUE
INDECISIVE, VACILLATING, DOESN'T KNOW H
NO ATTRIBUTE MENTIONED
YES
YES
PREVIOUS RESPONDENT INTERVIEWED
MALE
WHITE
25-34
HIGH SCHOOL
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
OTHER
OTHER REASONS
NO SECOND REASON
NO GROUP, OR NO SECOND, THIRD GROUP
NO GROUP, OR NO SECOND, THIRD GROUP
NO GROUP, OR NO SECOND, THIRD GROUP
NO ISSUE
NO ISSUE
EXPERIENCED, CAPABLE, COMPETENT, INTELL
NO ATTRIBUTE MENTIONED
YES
YES
PREVIOUS RESPONDENT INTERVIEWED
MALE
WHITE
25-34
HIGH SCHOOL
We could also simply access the classification information for a given variable.
# Variable information
metadata <- dataSet@metadata
V480045 <- rds:::variable(metadata, "V480045")
classification <- rds:::classification(metadata, V480045$classification)
classTable <- sjPlot::sjt.df(classification@codes, useViewer = F, describe = FALSE,
encoding = "UTF-8", no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr
value
label
1
MALE
2
FEMALE
9
NA
Perhaps we would like to know if there is any difference between why male and female respondents think people voted for Truman. First lets create the table using the tabulate function.
tabulation <- tabulate("http://richdataservices.com/public/api/catalog/", "test",
"anes1948", dimensions = "V480045,V480014a", inject = TRUE)
data <- tabulation@data
table <- sjPlot::sjt.df(data, useViewer = F, describe = FALSE, encoding = "UTF-8",
no.output = TRUE, altr.row.col = TRUE, show.rownames = FALSE)$knitr
V480045
V480014a
count
MALE
BETTER MAN
14
MALE
EXPERIENCED, GOOD RECORD
22
MALE
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
94
MALE
GOOD CAMPAIGN CONDUCTED BY TRUMAN
20
MALE
DEMOCRATS MEAN PROSPERITY, REPUBLICANS
41
MALE
TRUMAN PRO RENT CONTROL, PRICE CONTROL,
22
MALE
ROOSEVELT TRADITION
4
MALE
PERSONAL ATTRIBUTES
5
MALE
OTHER
61
MALE
DK
13
MALE
NA
6
FEMALE
BETTER MAN
35
FEMALE
EXPERIENCED, GOOD RECORD
34
FEMALE
TRUMAN PRO-LABOR, NEGRO, WORKING MAN
88
FEMALE
GOOD CAMPAIGN CONDUCTED BY TRUMAN
19
FEMALE
DEMOCRATS MEAN PROSPERITY, REPUBLICANS
39
FEMALE
TRUMAN PRO RENT CONTROL, PRICE CONTROL,
17
FEMALE
ROOSEVELT TRADITION
10
FEMALE
PERSONAL ATTRIBUTES
15
FEMALE
OTHER
58
Because the tabulate function returns a data set that contains the data and metadata we can create a chart with minimal effort. Simply plug the metadata and data into the appropriate places and do any other formatting that is necessary displaying the chart.
# get the metadata from the previously returned dataSet which applies to both the
# male and female data
metadata <- tabulation@metadata
V480045 <- rds:::variable(metadata, "V480045")
V480014a <- rds:::variable(metadata, "V480014a")
classification <- rds:::classification(metadata, V480045$classification)
## we will compute the percentage of that male and female
## responses for each category
data = ddply(data, .(V480014a), transform, percent = count/sum(count) * 100)
data = ddply(data, .(V480014a), transform, pos = (cumsum(count) - 0.5 * count))
data$label = paste0(sprintf("%.0f", data$percent), "%")
# plot the data
ggplot(data, aes(x = factor(V480014a), y = count, fill = V480045)) + geom_bar(stat = "identity") +
geom_text(aes(y = pos, label = label), size = 3) + theme(axis.text.x = element_text(angle = 90,
hjust = 1, vjust = 0.5), axis.text.y = element_text(size = 12), legend.position = "top") +
xlab(V480014a$label) + coord_flip()
Putting this product together and maintaining the repository takes time and resources. We welcome your support in any shape or form, in particular:
This work is licensed under the BSD-3 License. See LICENSE file for details.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.