knitr::opts_chunk$set(comment = "#>", fig.align='center',
                      fig.width = 7, fig.height = 5)
library(Rvoteview)

This package provides tools to query and download from the VoteView database. This vignette will demonstrate the different types of queries that can be used, how Rvoteview can be used to do ideal point estimation on a subset of votes using the pscl package and the wnominate package, and how Rvoteview facilitates regression analyses of congressional voting behavior.

  1. Installation
  2. Querying the database with voteview_search
  3. Downloading roll call data with voteview_download
  4. Additional Methods a. Joining two rollcall objects b. Melting rollcall objects c. Completing interrupted downloads d. Retrieving member data
  5. Extended Examples a. Ideal point estimation] b. Regression analysis of roll call behavior

Installation

To install this package, ensure you have devtools installed. If you do not, run install.packages("devtools") and then install from GitHub using

devtools::install_github("voteview/Rvoteview")

For a quick start, see the README in the GitHub repository here.

Querying the database with voteview_search

The first main function of this package is to allow users to search for roll calls. Using a custom query parser, we allow both simple and complex queries to be made to the VoteView database. The simple way uses a set of arguments to build a query within the R package while the complex way allows the user to build a specific query with nested, boolean logic. Both can also be used simultaneously. You can find the full documentation for the query parser here.

Simple text queries

The q argument should be treated similarly to a search box online. You can put in text search terms, specific fields with parameters, or it can be left blank if other arguments are used. The simple usage is to treat the q argument as a way to search all text fields. If you want to search a specific phrase, put the query in quotes. This will essentially look for that exact phrase in many of the text fields in the database. Alternatively, if you search without using quotes, the word will be lemmatized (shortened) and will search an index of the text fields. For example, we can search for "terrorism" exactly or loosely using the index:

library(Rvoteview)
res <- voteview_search("'terrorism'") # exact
res <- voteview_search("terrorism")   # index based

You can also search for multiple words:

res <- voteview_search("terrorism iraq") # index based search

Using the text index, the MongoDB that houses the rollcalls will search for the documents for either of these words and return the best matches. In effect, this will return documents that have either "terror" or "iraq" or various shortened versions of those words.

Basic syntax

When using one of the simple queries above, the query parser automatically adds a field to the front of a query that does not specify which field to search. In order to specify a specific field, use the following fieldname:query syntax. To replicate the last example more explicitly, we use the following:

res <- voteview_search("alltext:terrorism iraq")

Unfortunately, due to the way the query parser works, you cannot search for two exact words at the moment or search in two different specific text fields. You can however look within a specific text field.

res <- voteview_search("vote_desc:'iraq'")

Using additional arguments

Users can also use other arguments to search only roll calls that are in a certain chamber of Congress, within a date range, within a certain set of congresses, and within a level of support, defined as the percent of total valid votes that were yea vote. This is especially useful if users only want to return competitive votes. Note that all fields are joined using "AND" logic; for example you search for roll calls using the keyword "tax" AND are in the House but not votes that either use the keyword "tax" OR were held in the House. Also note that the congress field uses "OR" logic within the numeric vector that specifices which congress to search. No roll call can be in two congresses, so it makes no sense to search for roll calls that are in one congress AND in another congress.

## Search for votes with a start date
## Note that because tax is not in quotes, it searches the text index and not for
## exact matches
res <- voteview_search("tax", startdate = "2005-01-01")

## Search for votes with an end date in just the House
res <- voteview_search("tax", enddate = "2005-01-01", chamber = "House")

## Search for votes with a start date in just the house in the 110th or 112th Congress
res <- voteview_search("tax",
                       startdate = "2000-12-20",
                       congress = c(110, 112),
                       chamber = "House")

You can always see exactly what search was used to create a set of roll calls by retrieving the 'qstring' attribute of the returned data frame:

attr(res, "qstring")

Building complex queries

As previewed before, users can use the q argument to specify complex queries by specifying which fields to search and how to combine fields using boolean logic. The complete documentation can be found here. In general, the following syntax is used, field:specific phrase (field:other phrase OR field:second phrase).

For example, if you wanted to find votes where 'war' and 'iraq' were present but only up to 1993 and after 2000, you could write it like so:

qString <- "alltext:war iraq (enddate:1993 OR startdate:2000)"
res <- voteview_search(q = qString)

Whenever in doubt, add parentheses to make the query clearer!

Numeric fields can be searched in a similar way, although users can also use square brackets and "to" for ranges of numbers. For example, the query for all votes about taxes in the 100th to 102nd congress could be expressed either using "alltext:taxes congress:100 OR congress:101 OR congress:102" or using "alltext:taxes congress:[100 to 102]". Note that if you want to restrict search to certain dates, the startdate and enddate arguments in the function should be used.

For example, here is a query that will get votes from the 100 to 102nd congress on tax where the percent of the rollcall votes in favor will be between 45 and 55 percent, inclusive.

qString <- "alltext:tax iraq (congress:[100 to 102] AND support:[45 to 55])"
res <- voteview_search(q = qString)

Downloading roll call data with voteview_download

The second main function of this package is to allow users to download detailed roll call data into a modified rollcall object from the pscl package. The default usage is to pass voteview_download a vector of roll call id numbers that we return in the voteview_search function.

## Search all votes with the exact phrase "estate tax" in the 105th congress
res <- voteview_search("'estate tax' congress:105")

## Download all estate tax votes
rc <- voteview_download(res$id)

summary(rc)
summary(rc)

Importantly, the object we return is a modified rollcall object, in that it may contain additional elements that the authors of the pscl package did not include. Therefore it will work with all of the methods they wrote for rollcall objects as well as some methods we include in this package. The biggest difference between the original rollcall object and what we return is the inclusion of "long" versions of the votes.data and legis.data data frames, described below.

First, because icpsr numbers are not necessarily unique to legislators, we include legis.long.dynamic in the output. For example, when Strom Thurmond changed parties, his icpsr number also changed. However, when building rollcall objects, icpsr numbers are the default. Therefore, legis.long.dynamic contains a record of every legislator-party-congress as a unique id, as well as the relevant covariates.

Second, we include votes.long, a data frame where the rows are legislator-roll calls and contain how each legislator voted on each roll call. This is the long version of the votes matrix included in all rollcall objects.

Additional Methods

We also add three methods that can be used on rollcall objects created by our package.

Joining two rollcall objects

The first function allows for a full outer join of two rollcall objects downloaded from the VoteView database, creating a new rollcall object that is a union of the two. It is called by using the %+% operator. This is especially useful if the user downloaded two roll call objects at separate times and wants to join them together rather than re-download all of the votes at the same time.

try({detach("package:ggplot2", unload=TRUE)}, silent = T)
## Search all votes with exact phrase "estate tax"
res <- voteview_search("'estate tax' congress:105")

## Download first 10 votes
rc1 <- voteview_download(res$id[1:10])
## Download another 10 votes with some overlap
rc2 <- voteview_download(res$id[5:14])

## Merge them together
rcall <- rc1 %+% rc2

rcall$m # The number of total votes
rcall$m

Melting rollcall objects

We also provide a function called melt_rollcall which allows users to produce a long data frame that is essentially the same as votes.long but includes all of the roll call and legislator data on each row.

## Default is to retain all data
rc_long <- melt_rollcall(rcall)
rc_long[1:3, 1:17]

## Retaining fewer columns
rc_long <- melt_rollcall(rcall, votecols = c("chamber", "congress"))
rc_long[1:3, ]

Completing interrupted downloads

If your internet connection drops in the middle of a download or you have to interrupt a download for some reason, the voteview_download function should try to complete building the rollcall object with whatever data it has successfully downloaded. While manually interrupting functions in R is tricky and we cannot catch interrupts perfectly, if it does succeed or if your connection does drop, then we store the roll call ids that you were unable to retrieve in the unretrievedids slot of our modified rollcall object. Users can then use the complete_download function to download the unretrieved ids and create a complete rollcall object. For example, imagine the following download stalls as your wireless cuts out at this cute coffee shop that has beans roasted in house but cannot manage a good wireless conenction:

rc_fail <- voteview_download(res$id)

If this fails but still manages to build a rollcall object with whatever ids it was able to retrieve, then we can complete the download with a simple command:

rc <- complete_download(rc_fail)

Again, because of the difficulty with properly catching interrupts in R, this will not always work with manual interrupts, but should work with dropped internet connections.

Retrieving member data

There is also the ability to search the database for members (House Representatives, Senators, and Presidents) using the member_search function. Unfortunately, the syntax is not identical to the syntax when searching for roll calls. Nonetheless, the usage in R is quite simple. There are fields to search members' names, icpsr number, state (either ICPSR state number, two letter postal code, or the full name), the range of congresses to search within, the CQ label of the member, and the chamber to search within.

The function returns a data frame of metadata, with one row for each legislator-congress that is found (these are the unique entries in the database of members). Therefore, if we want to return all unique legislator-congresses where the name 'clinton' appears anywhere in the name fields, we can use the following search:

clintons <- member_search("clinton")

## Drop the bio field because it is quite long
clintons[1:7, names(clintons) != "bio"]

It is important to note that if there is no white space in the name field, the database is searched for exact matches for that one word. If there are multiple words we use a text index of all of the name fields and return the best matches.

If you only want to return the first record per ICPSR number, you can set the distinct flag equal to one. This is useful because it limits the size of the object returned and most data is duplicated within ICPSR number. For example, CS DW-NOMINATE scores are constant within ICPSR number, as are names and (usually) party.

clintons <- member_search("clinton",
                          state = "NY",
                          distinct = 1)

## Drop the bio field because it is quite long
clintons[, names(clintons) != "bio"]

Some other fields that are not unique to ICPSR number but may vary are the chamber of the representative, their CQ label, and the number of votes they cast. Let's get all the records for Bernie Sanders.

sanders <- member_search("sanders",
                         state = "VT")

## Drop the bio field because it is quite long
sanders[, names(sanders) != "bio"]

As you can see Sanders changes chambers between the 109th and 110th congresses and a few other fields differ as well. Nonetheless, most is repeated.

Extended Examples

This section details three different possible uses of the Rvoteview package, showing users from beginning to end how to conduct their own ideal point estimation and use Rvoteview in more traditional regression analysis.

Ideal point estimation

Imagine that we want to estimate ideal points for all legislators voting on foreign policy during the first six months of Obama's presidency.. We will use all roll calls that fit the Clausen category "Foreign and Defense Policy" and are somewhat competitive, meaning between 15 and 85 percent of votes on the floor were yeas.

## Load packages
library(ggplot2)   # Load this first so that Rvoteview can use %+%
library(Rvoteview)

## Search database for votes that meet our criteria
res <- voteview_search("codes.Clausen:Foreign and Defense Policy support:[15 to 85]",
                       startdate = "2009-01-20", enddate = "2009-07-20")
## Download votes into rollcall object
rc <- voteview_download(res$id)
summary(rc)

Now we use the wnominate package to run an ideal point estimation.

library(wnominate)
# Find extreme legislators for polarity argument
cons1 <- rc$legis.long.dynamic[which.max(rc$legis.data$dim1), c("name", "icpsr")]
cons2 <- rc$legis.long.dynamic[which.max(rc$legis.data$dim2), c("name", "icpsr")]
defIdeal <- wnominate(rc,
                      polarity = list("icpsr", c(20753, 20523)))

This ideal point estimation also returns the estimated points attached the all of the legislator and rollcall metadata already in the rc object! This can be useful in creating custom plots.

## Create text party name
defIdeal$legislators$partyName <- ifelse(defIdeal$legislators$party == 200, "Republican",
                                  ifelse(defIdeal$legislators$party == 100, "Democrat", "Independent"))

ggplot(defIdeal$legislators,
       aes(x=coord1D, y=coord2D, color=partyName, label=state_abbrev)) +
  geom_text() +
  scale_color_manual("Party", values = c("Republican" = "red",
                                         "Democrat" = "blue",
                                         "Independent" = "darkgreen")) +
  theme_bw()

We see the usual split between Republicans and Democrats.

We can also use the build in plot method from `wnominate to produce some great figures from our estimation.

# Some great plots!
plot(defIdeal)

The rollcall objects we build can also be used in the ideal function in the pscl package.

library(pscl)
defIdeal <- ideal(rc,
                  d = 2)
library(pscl)
defIdeal <- ideal(rc,
                 d = 2,
                 maxiter = 5000,
                 thin = 10,
                 burnin = 1000)

We can also use the pscl plot method.

plot(defIdeal)

Regression analysis of roll call behavior

Users can also use the VoteView API to run regression analyses. Let's take the state level opinion data on gay rights that was estimated in Lax and Phillips (2009). They used multilevel regression and poststratification on surveys from 1999-2008 in order to estimate state-level explicit support for gay rights issues. Let's pull down some important bills presented before the 111th congress (2009-2011) and see how state level public opinion in the preceding years predicts voting behavior in the legislature.

Let's see what bills there were in the 111th congress that had to do with homosexuality. We can use a search that will capture quite a few different bills.

## Two separate searches because fields cannot be joined with an OR
res <- voteview_search("codes.Issue:Homosexuality congress:111")
res[1:5, 1:10]

To focus on actual bills that were of some consequence, let's take the House and Senate don't ask don't tell votes and the hate crimes bill from the House.

dadt <- voteview_download(c("RH1111621", "RS1110678", "RH1110222"))
dadt$vote.data

Now we want to turn this into a long dataframe, where each row is a legislator-vote. We could also then cast this using a standard cast function or the reshape2 package to have each row be a legislator, or each row be a legislator-congress and so on. The longer format will serve our purposes for now. Note that dim1 and dim2 are the Common Space DW-Nominate positions on the first and second ideological dimensions. They are fixed over the legislator's tenure in office.

## Only retain certain columns with respect to the legislator and the vote
dadtLong <- melt_rollcall(dadt,
                          legiscols = c("name", "state_abbrev","party_code", "dim1", "dim2"),
                          votecols = c("vname", "date", "chamber"))
head(dadtLong)

Included in the package is a dataframe that links the numeric ICPSR codes to state names and state mail codes. You can load the data by calling data(states). We use this to merge in the proper state names that will be matched to the Lax and Phillips (2009) dataset. Obama appears three times in this dataset and will be dropped in this merge.

data(states)
dadtLong <- merge(dadtLong, states[, c("state_abbrev", "state_name")],
                  by = "state_abbrev")

dadtLong$state_name <- tolower(dadtLong$state_name)

Now we use the Lax and Phillips (2009) data, which we make available in the package as well under lpOpinion.

data(lpOpinion)
lpOpinion$state <- tolower(lpOpinion$state)

df <- merge(dadtLong, lpOpinion,
            by.x = "state_name", by.y = "state")
head(df)

Now let's build a dichotomous variable that represents whether the legislator voted yea on that bill (1), nay on that bill (0), or abstained (NA).

## Recode votes
df$voteYes <- ifelse(df$vote == 1, 1, ifelse(df$vote == 6, 0, NA))

## Raw votes by party
table(df$party_code, df$voteYes, useNA = "always")

## Recode party (add independent to democrats)
df$republican <- ifelse(df$party_code == "200", 1, 0)

Let's use meanOpinion from the Lax and Phillips (2009) data, which is the average of pro-gay public opinion sentiment on various dimensions. We will use it in a couple of analyses.

## Simple model
summary(lm(voteYes ~ meanOpinion, data = df))

## Control for party
summary(lm(voteYes ~ meanOpinion*republican, data = df))

## Control for ideology
## Note that ideology here has been estimated using these and later votes,
## so interpret the results with some caution
summary(lm(voteYes ~ meanOpinion*republican + dim1 + dim2,
           data = df))

## Now let's look just at repealing don't ask don't tell and add chamber fixed effects
summary(lm(voteYes ~ meanOpinion*republican + dim1 + dim2 + chamber,
           data = df[df$vname != "RH1110222", ]))

Even when controlling for ideology and party, it seems that legislators, and especially Republican legislators, are more likely to vote for pro-gay rights bills when their state has a high average level of pro-gay rights sentiment.



JeffreyBLewis/Rvoteview documentation built on Oct. 15, 2019, 10:18 p.m.