knitr::opts_chunk$set(comment = "#>", fig.align='center', fig.width = 7, fig.height = 5) library(Rvoteview)
This package provides tools to query and download from the VoteView database. This vignette will demonstrate the different types of queries that can be used, how Rvoteview
can be used to do ideal point estimation on a subset of votes using the pscl
package and the wnominate
package, and how Rvoteview
facilitates regression analyses of congressional voting behavior.
voteview_search
voteview_download
rollcall
objects
b. Melting rollcall
objects
c. Completing interrupted downloads
d. Retrieving member dataTo install this package, ensure you have devtools
installed. If you do not, run install.packages("devtools")
and then install from GitHub using
devtools::install_github("voteview/Rvoteview")
For a quick start, see the README in the GitHub repository here.
voteview_search
The first main function of this package is to allow users to search for roll calls. Using a custom query parser, we allow both simple and complex queries to be made to the VoteView database. The simple way uses a set of arguments to build a query within the R
package while the complex way allows the user to build a specific query with nested, boolean logic. Both can also be used simultaneously. You can find the full documentation for the query parser here.
The q
argument should be treated similarly to a search box online. You can put in text search terms, specific fields with parameters, or it can be left blank if other arguments are used. The simple usage is to treat the q
argument as a way to search all text fields. If you want to search a specific phrase, put the query in quotes. This will essentially look for that exact phrase in many of the text fields in the database. Alternatively, if you search without using quotes, the word will be lemmatized (shortened) and will search an index of the text fields. For example, we can search for "terrorism" exactly or loosely using the index:
library(Rvoteview) res <- voteview_search("'terrorism'") # exact res <- voteview_search("terrorism") # index based
You can also search for multiple words:
res <- voteview_search("terrorism iraq") # index based search
Using the text index, the MongoDB that houses the rollcalls will search for the documents for either of these words and return the best matches. In effect, this will return documents that have either "terror" or "iraq" or various shortened versions of those words.
When using one of the simple queries above, the query parser automatically adds a field to the front of a query that does not specify which field to search. In order to specify a specific field, use the following fieldname:query
syntax. To replicate the last example more explicitly, we use the following:
res <- voteview_search("alltext:terrorism iraq")
Unfortunately, due to the way the query parser works, you cannot search for two exact words at the moment or search in two different specific text fields. You can however look within a specific text field.
res <- voteview_search("vote_desc:'iraq'")
Users can also use other arguments to search only roll calls that are in a certain chamber of Congress, within a date range, within a certain set of congresses, and within a level of support, defined as the percent of total valid votes that were yea vote. This is especially useful if users only want to return competitive votes. Note that all fields are joined using "AND" logic; for example you search for roll calls using the keyword "tax" AND are in the House but not votes that either use the keyword "tax" OR were held in the House. Also note that the congress field uses "OR" logic within the numeric vector that specifices which congress to search. No roll call can be in two congresses, so it makes no sense to search for roll calls that are in one congress AND in another congress.
## Search for votes with a start date ## Note that because tax is not in quotes, it searches the text index and not for ## exact matches res <- voteview_search("tax", startdate = "2005-01-01") ## Search for votes with an end date in just the House res <- voteview_search("tax", enddate = "2005-01-01", chamber = "House") ## Search for votes with a start date in just the house in the 110th or 112th Congress res <- voteview_search("tax", startdate = "2000-12-20", congress = c(110, 112), chamber = "House")
You can always see exactly what search was used to create a set of roll calls by retrieving the 'qstring' attribute of the returned data frame:
attr(res, "qstring")
As previewed before, users can use the q
argument to specify complex queries by specifying which fields to search and how to combine fields using boolean logic. The complete documentation can be found here. In general, the following syntax is used, field:specific phrase (field:other phrase OR field:second phrase)
.
For example, if you wanted to find votes where 'war' and 'iraq' were present but only up to 1993 and after 2000, you could write it like so:
qString <- "alltext:war iraq (enddate:1993 OR startdate:2000)" res <- voteview_search(q = qString)
Whenever in doubt, add parentheses to make the query clearer!
Numeric fields can be searched in a similar way, although users can also use square brackets and "to" for ranges of numbers. For example, the query for all votes about taxes in the 100th to 102nd congress could be expressed either using "alltext:taxes congress:100 OR congress:101 OR congress:102"
or using "alltext:taxes congress:[100 to 102]"
. Note that if you want to restrict search to certain dates, the startdate
and enddate
arguments in the function should be used.
For example, here is a query that will get votes from the 100 to 102nd congress on tax where the percent of the rollcall votes in favor will be between 45 and 55 percent, inclusive.
qString <- "alltext:tax iraq (congress:[100 to 102] AND support:[45 to 55])" res <- voteview_search(q = qString)
voteview_download
The second main function of this package is to allow users to download detailed roll call data into a modified rollcall
object from the pscl
package. The default usage is to pass voteview_download
a vector of roll call id numbers that we return in the voteview_search
function.
## Search all votes with the exact phrase "estate tax" in the 105th congress res <- voteview_search("'estate tax' congress:105") ## Download all estate tax votes rc <- voteview_download(res$id) summary(rc)
summary(rc)
Importantly, the object we return is a modified rollcall
object, in that it may contain additional elements that the authors of the pscl
package did not include. Therefore it will work with all of the methods they wrote for rollcall
objects as well as some methods we include in this package. The biggest difference between the original rollcall
object and what we return is the inclusion of "long" versions of the votes.data
and legis.data
data frames, described below.
First, because icpsr numbers are not necessarily unique to legislators, we include legis.long.dynamic
in the output. For example, when Strom Thurmond changed parties, his icpsr number also changed. However, when building rollcall objects, icpsr numbers are the default. Therefore, legis.long.dynamic
contains a record of every legislator-party-congress as a unique id, as well as the relevant covariates.
Second, we include votes.long
, a data frame where the rows are legislator-roll calls and contain how each legislator voted on each roll call. This is the long version of the votes
matrix included in all rollcall
objects.
We also add three methods that can be used on rollcall
objects created by our package.
rollcall
objectsThe first function allows for a full outer join of two rollcall
objects downloaded from the VoteView database, creating a new rollcall
object that is a union of the two. It is called by using the %+%
operator. This is especially useful if the user downloaded two roll call objects at separate times and wants to join them together rather than re-download all of the votes at the same time.
try({detach("package:ggplot2", unload=TRUE)}, silent = T)
## Search all votes with exact phrase "estate tax" res <- voteview_search("'estate tax' congress:105") ## Download first 10 votes rc1 <- voteview_download(res$id[1:10]) ## Download another 10 votes with some overlap rc2 <- voteview_download(res$id[5:14]) ## Merge them together rcall <- rc1 %+% rc2 rcall$m # The number of total votes
rcall$m
rollcall
objectsWe also provide a function called melt_rollcall
which allows users to produce a long data frame that is essentially the same as votes.long
but includes all of the roll call and legislator data on each row.
## Default is to retain all data rc_long <- melt_rollcall(rcall) rc_long[1:3, 1:17] ## Retaining fewer columns rc_long <- melt_rollcall(rcall, votecols = c("chamber", "congress")) rc_long[1:3, ]
If your internet connection drops in the middle of a download or you have to interrupt a download for some reason, the voteview_download
function should try to complete building the rollcall
object with whatever data it has successfully downloaded. While manually interrupting functions in R
is tricky and we cannot catch interrupts perfectly, if it does succeed or if your connection does drop, then we store the roll call ids that you were unable to retrieve in the unretrievedids
slot of our modified rollcall
object. Users can then use the complete_download
function to download the unretrieved ids and create a complete rollcall
object. For example, imagine the following download stalls as your wireless cuts out at this cute coffee shop that has beans roasted in house but cannot manage a good wireless conenction:
rc_fail <- voteview_download(res$id)
If this fails but still manages to build a rollcall
object with whatever ids it was able to retrieve, then we can complete the download with a simple command:
rc <- complete_download(rc_fail)
Again, because of the difficulty with properly catching interrupts in R
, this will not always work with manual interrupts, but should work with dropped internet connections.
There is also the ability to search the database for members (House Representatives, Senators, and Presidents) using the member_search
function. Unfortunately, the syntax is not identical to the syntax when searching for roll calls. Nonetheless, the usage in R
is quite simple. There are fields to search members' names, icpsr number, state (either ICPSR state number, two letter postal code, or the full name), the range of congresses to search within, the CQ label of the member, and the chamber to search within.
The function returns a data frame of metadata, with one row for each legislator-congress that is found (these are the unique entries in the database of members). Therefore, if we want to return all unique legislator-congresses where the name 'clinton' appears anywhere in the name fields, we can use the following search:
clintons <- member_search("clinton") ## Drop the bio field because it is quite long clintons[1:7, names(clintons) != "bio"]
It is important to note that if there is no white space in the name field, the database is searched for exact matches for that one word. If there are multiple words we use a text index of all of the name fields and return the best matches.
If you only want to return the first record per ICPSR number, you can set the distinct flag equal to one. This is useful because it limits the size of the object returned and most data is duplicated within ICPSR number. For example, CS DW-NOMINATE scores are constant within ICPSR number, as are names and (usually) party.
clintons <- member_search("clinton", state = "NY", distinct = 1) ## Drop the bio field because it is quite long clintons[, names(clintons) != "bio"]
Some other fields that are not unique to ICPSR number but may vary are the chamber of the representative, their CQ label, and the number of votes they cast. Let's get all the records for Bernie Sanders.
sanders <- member_search("sanders", state = "VT") ## Drop the bio field because it is quite long sanders[, names(sanders) != "bio"]
As you can see Sanders changes chambers between the 109th and 110th congresses and a few other fields differ as well. Nonetheless, most is repeated.
This section details three different possible uses of the Rvoteview
package, showing users from beginning to end how to conduct their own ideal point estimation and use Rvoteview
in more traditional regression analysis.
Imagine that we want to estimate ideal points for all legislators voting on foreign policy during the first six months of Obama's presidency.. We will use all roll calls that fit the Clausen category "Foreign and Defense Policy" and are somewhat competitive, meaning between 15 and 85 percent of votes on the floor were yeas.
## Load packages library(ggplot2) # Load this first so that Rvoteview can use %+% library(Rvoteview) ## Search database for votes that meet our criteria res <- voteview_search("codes.Clausen:Foreign and Defense Policy support:[15 to 85]", startdate = "2009-01-20", enddate = "2009-07-20")
## Download votes into rollcall object rc <- voteview_download(res$id)
summary(rc)
Now we use the wnominate
package to run an ideal point estimation.
library(wnominate) # Find extreme legislators for polarity argument cons1 <- rc$legis.long.dynamic[which.max(rc$legis.data$dim1), c("name", "icpsr")] cons2 <- rc$legis.long.dynamic[which.max(rc$legis.data$dim2), c("name", "icpsr")] defIdeal <- wnominate(rc, polarity = list("icpsr", c(20753, 20523)))
This ideal point estimation also returns the estimated points attached the all of the legislator and rollcall metadata already in the rc
object! This can be useful in creating custom plots.
## Create text party name defIdeal$legislators$partyName <- ifelse(defIdeal$legislators$party == 200, "Republican", ifelse(defIdeal$legislators$party == 100, "Democrat", "Independent")) ggplot(defIdeal$legislators, aes(x=coord1D, y=coord2D, color=partyName, label=state_abbrev)) + geom_text() + scale_color_manual("Party", values = c("Republican" = "red", "Democrat" = "blue", "Independent" = "darkgreen")) + theme_bw()
We see the usual split between Republicans and Democrats.
We can also use the build in plot
method from `wnominate to produce some great figures from our estimation.
# Some great plots! plot(defIdeal)
The rollcall
objects we build can also be used in the ideal
function in the pscl
package.
library(pscl) defIdeal <- ideal(rc, d = 2)
library(pscl) defIdeal <- ideal(rc, d = 2, maxiter = 5000, thin = 10, burnin = 1000)
We can also use the pscl
plot method.
plot(defIdeal)
Users can also use the VoteView API to run regression analyses. Let's take the state level opinion data on gay rights that was estimated in Lax and Phillips (2009). They used multilevel regression and poststratification on surveys from 1999-2008 in order to estimate state-level explicit support for gay rights issues. Let's pull down some important bills presented before the 111th congress (2009-2011) and see how state level public opinion in the preceding years predicts voting behavior in the legislature.
Let's see what bills there were in the 111th congress that had to do with homosexuality. We can use a search that will capture quite a few different bills.
## Two separate searches because fields cannot be joined with an OR res <- voteview_search("codes.Issue:Homosexuality congress:111") res[1:5, 1:10]
To focus on actual bills that were of some consequence, let's take the House and Senate don't ask don't tell votes and the hate crimes bill from the House.
dadt <- voteview_download(c("RH1111621", "RS1110678", "RH1110222")) dadt$vote.data
Now we want to turn this into a long dataframe, where each row is a legislator-vote. We could also then cast this using a standard cast function or the reshape2
package to have each row be a legislator, or each row be a legislator-congress and so on. The longer format will serve our purposes for now. Note that dim1
and dim2
are the Common Space DW-Nominate positions on the first and second ideological dimensions. They are fixed over the legislator's tenure in office.
## Only retain certain columns with respect to the legislator and the vote dadtLong <- melt_rollcall(dadt, legiscols = c("name", "state_abbrev","party_code", "dim1", "dim2"), votecols = c("vname", "date", "chamber")) head(dadtLong)
Included in the package is a dataframe that links the numeric ICPSR codes to state names and state mail codes. You can load the data by calling data(states)
. We use this to merge in the proper state names that will be matched to the Lax and Phillips (2009) dataset. Obama appears three times in this dataset and will be dropped in this merge.
data(states) dadtLong <- merge(dadtLong, states[, c("state_abbrev", "state_name")], by = "state_abbrev") dadtLong$state_name <- tolower(dadtLong$state_name)
Now we use the Lax and Phillips (2009) data, which we make available in the package as well under lpOpinion
.
data(lpOpinion) lpOpinion$state <- tolower(lpOpinion$state) df <- merge(dadtLong, lpOpinion, by.x = "state_name", by.y = "state") head(df)
Now let's build a dichotomous variable that represents whether the legislator voted yea on that bill (1), nay on that bill (0), or abstained (NA).
## Recode votes df$voteYes <- ifelse(df$vote == 1, 1, ifelse(df$vote == 6, 0, NA)) ## Raw votes by party table(df$party_code, df$voteYes, useNA = "always") ## Recode party (add independent to democrats) df$republican <- ifelse(df$party_code == "200", 1, 0)
Let's use meanOpinion
from the Lax and Phillips (2009) data, which is the average of pro-gay public opinion sentiment on various dimensions. We will use it in a couple of analyses.
## Simple model summary(lm(voteYes ~ meanOpinion, data = df)) ## Control for party summary(lm(voteYes ~ meanOpinion*republican, data = df)) ## Control for ideology ## Note that ideology here has been estimated using these and later votes, ## so interpret the results with some caution summary(lm(voteYes ~ meanOpinion*republican + dim1 + dim2, data = df)) ## Now let's look just at repealing don't ask don't tell and add chamber fixed effects summary(lm(voteYes ~ meanOpinion*republican + dim1 + dim2 + chamber, data = df[df$vname != "RH1110222", ]))
Even when controlling for ideology and party, it seems that legislators, and especially Republican legislators, are more likely to vote for pro-gay rights bills when their state has a high average level of pro-gay rights sentiment.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.