In wmay/dwnominate: An Interface to the DW-NOMINATE Roll Call Scaling Program

library(knitr)
opts_chunk$set(collapse = TRUE, comment = "#>", cache = TRUE)

DW-NOMINATE uses a local optimization algorithm to get an approximate maximum likelihood estimate of legislator ideal points. However, the likelihood function is complex and can contain local maxima that aren't the global maximum. Therefore it's important to provide starting estimates close to the global maximum in order for DW-NOMINATE to find the correct results.

There's currently no single best approach for creating starting estimates. The choice of approach depends on properties of the data. In legislatures with large amounts of membership overlap between sessions and stable dimensions, static estimates can be straightforward to calculate and work well. When that's not the case, some manual intervention may be required. If the results of previous scaling estimates are available, those can also be used.

W-NOMINATE and optimal classification

The wnominate and oc packages in R can create static ideal point estimates using W-NOMINATE and optimal classification methods, respectively. Neither package is designed to work with multiple legislative sessions, but they can be used for this purpose by combining the voting data from multiple sessions into a single rollcall object.

The downside is that both W-NOMINATE and optimal classification can take much longer to run than DW-NOMINATE when there are many sessions. Of the two, oc is typically faster.

library(dwnominate)
library(oc)
data(nhsenate)

# combine NH Senate rollcall objects
combined_nhsenate <- merge_rollcalls(nhsenate)
# get optimal classification scores for the combined rollcalls
starts <- oc(combined_nhsenate, polarity = c(1, 1))

Common space scores

Another strategy for deriving static scores is to calculate scores for each session individually, and combine the results into a single overall score. That's the approach used by the common space method. This method is more computationally efficient than running W-NOMINATE on the combined set of rollcalls.

library(dwnominate)
library(wnominate)
data(nhsenate)

# run W-NOMINATE on every session of the NH Senate
wnom_list <- lapply(nhsenate, wnominate, polarity = c(1, 1))
# extract common space scores
starts <- common_space(wnom_list)

Manual strategies

Poole and Rosenthal's original D-NOMINATE paper describes some of the difficulties they encountered creating starting estimates

In actual practice a legislator votes on only a small slice of all roll calls in the history of a House of Congress, so there is very substantial "missing" data. "Missing" data is not a problem so long as there is, as in modern times, substantial overlap in careers. But when the membership of either House shifts very rapidly, the results become sensitive to the starts. The problem is greatest for the House in the nineteenth century. With large amounts of missing data, Poole's procedure provided poor starts to D-NOMINATE.

Our approach to this problem was to watch animated videos of the scaling results. When rapid movement induced a "twist" in the position of senators, we investigated multiplying second dimension starts for certain years (in the nineteenth century only) by -1 -- thereby flipping polarity. The result was to have a very slight improvement in the overall geometric mean probability and to substantially reduce the magnitudes of estimated trend coefficients in the period in question.

@carroll_measuring_2009 [p. 265] further describes this process and how it related to starting values for the original DW-NOMINATE scores:

DW-NOMINATE like its predecessor D-NOMINATE does not generate its own parameter starting values because of the sheer size of the roll call data set being analyzed. Good starting values are crucial to reliably performing this complex nonlinear estimation. In their development of D-NOMINATE, Poole and Rosenthal experimented for over 2 years until they had a satisfactory solution to the problem. Essentially, they began with what they thought was a sensible set of starting values and studied the estimated legislator coordinates by turning them into computer animations that could be played on a VHS videorecorder. In these animations, letter tokens and colors for the estimated legislator coordinates were used to identify parties and regions of the United States. Poole and Rosenthal then relied upon their knowledge of American political history to see if the output made visual sense. If they saw anomalies, they adjusted the output coordinates to correct for the anomalies and then used these adjusted coordinates as new starting values. At each step of this process, the fit of the model increased but it was a relatively arduous task because the supercomputers of the 1980s were limited in their capabilities and the animations took about a week to make on the equipment at the Pittsburgh Supercomputer Center.

The upshot of what one of us has described as "Poole and Rosenthal as the outer loop of the estimation" is that the final coordinates from D-NOMINATE that were released to the larger research community in 1989–90 were about as close to the global maximum of the (constrained) likelihood function shown in equation (10) as was practicable given the computer resources of the time.

Consequently, when DW-NOMINATE was developed in 1996 it simply used for starts the D-NOMINATE coordinates along with patched-in W-NOMINATE coordinates for Congresses 100–105.

@poole_spatial_2005 [p. 140] describes a simpler approach, piecing together common space scores estimated on subsets of the data:

I developed the CSS method in 1982, and Howard Rosenthal and I used it along with a simple metric scaling method (Poole 1984, 1990; Poole and Rosenthal 1997) to get starting coordinates in one and two dimensions for DW-NOMINATE. We used common-space coordinates for stable periods and then pieced together all the sets of starting coordinates for Congresses 1 to 100.

Using existing scores

Rather than generating custom starting estimates, it can be easier to use the results of previous scalings when these are available. For the US Congress, Voteview.com provides results of the DW-NOMINATE constant model [@lewis_voteview_2021; @boche_new_2018]. For US state legislatures, NPAT scores are available [@shor_individual_2020; @shor_ideological_2011].