A collection of tools for working with survey data from the British Election Study (BES) for statistical researchers. This package is principally designed for use by researchers in the House of Commons Library but may be useful to anyone using R for routine data analysis.

This package provides functions for generating turnout estimates from BES survey data by demographic characteristics. It is designed to make it easier for researchers, who may not frequently use BES survey data, to generate consistent and reproducible turnout estimates from a range of BES datasets.


Detailed voting behaviour by demographic characteristic is not officially collected in UK elections. To understand how different demographics voted (or not), researchers are reliant on estimates produced from opinion polls and survey data. The BES is the most comprehensive and reliable survey series.

There are principally two types of survey produced by the BES after an election: a panel dataset (commonly known as the internet panel) and a cross-sectional dataset (commonly known as the face to face panel). The internet panel is produced much sooner after an election than the face to face panel, although the latter is more robust due to a series of voter registeration validation checks.

Because there is a tendency for survey respondents to over-report whether they voted, total turnout estimates from the internet panel are often much higher than the actual known turnout result for an election. To compensate, a series of adjustments are needed to be applied to ensure estimated total turnout equals the known turnout result. The clbes package aims to provide a consistent and reproducible way to generate turnout estimates for both the the internet and and face to face datasets.

Turnout query

The turnout_query function can be used for either the BES internet or face to face datasets. There are four compulsory arguments to define, and five non-compulsory but when defined in varying combinations, can provide unweighted/weighted sample size and turnout estimates.

Compulsory arguments

The first of the compulsory arguments is data, and takes the name of the BES dataset you have loaded into R (it is recommended that the BES SPSS file versions are used in conjunction with the haven package). The second is dgraphic and is the name of the demographic characteristic variable of interest in the BES dataset. vote takes the name of the variable which captures whether the BES survey respondent said they voted in the election of interest. The fourth compulsory argument is wt which takes the name of the weighting variable to be used.

The codebooks accompanying BES datasets should be read for descriptions on variables available, and in particular, which weights should be used. There are different weights for the internet and face to face datasets.

Non-compulsory arguments

Validated and result

If you are using a BES dataset where voter validation checks have not been carried out (typically the BES internet datasets) then the argument validated should be returned as FALSE, and the argument result must take a floating point numeric between 0 and 1 indicating the known turnout percentage for the whole of Great Britain in the election of interest - e.g. 0.688 to indicate a 68.8% turnout rate.

When validated is FALSE and result is provided a floating point numeric, the function filters your loaded BES dataset to show only survey respondents who are over the age of 18 and say they are registered to vote. Additionally, any "Don't Know" survey responses are removed. The turnout estimates generated by the weighted sample are then adjusted by result so that the total estimated turnout equals result. This goes someway of accounting for the over-reporting of turnout by survey respondents, although it assumes the rate of over-reporting is equal across survey respondents.

If your dataset has been validated (usually the face to face datasets) and you are using the appropriate validation weights as specified in the BES codebooks, then validated and result should be left as their defaults.

Sample, percent and write

If you want to show the sample size (unweighted or weighted) for the generated turnout estimates, then the argument percent must be FALSE (this stops the function from calculating percentages), and sample either FALSE (its default) or TRUE.

When sample and percent are FALSE the weighted sample size is returned. If sample is TRUE and percent is FALSE then the unweighted sample size is returned. If pecent is not set to FALSE then sample sizes will not be shown.

To show sample sizes for unvalidated data (validated is FALSE), then result should be set to its default, 0. This is to stop all sample sizes being multiplied by result.

By default, all turnout estimates and sample sizes are returned as tibbles in the console. However, if write is TRUE then results are returned as a CSV named "turnout_dgraphic.csv" to the working directory.

Turnout common

The turnout_common function is a simpler version of turnout_query, although it is written to work specifically with the latest BES 2017 face to face survey only. The principle behind turnout_common is to provide researchers turnout estimates for commonly queried demographics from the latest BES face to face survey. There is no need to supply a BES dataset to the function as this is done automatically whenever the function is called.

The argument dgraphic is much the same as in turnout_query although this must take one of five strings between quotation marks: age, gender, ethnicity, religion and education. Each string relates to a demographic characteristic within the BES face to face survey and automatically recodes and condenses variables where appropriate to create larger sample sizes per characteristic (e.g. ethnicity recodes the y11 variable from its 18 possible values to four).

The sample, percent and write arguments behave in the same way as in turnout_query. The only difference being is that when write is true a CSV named "BES_2017_FTF_turnout_dgraphic.csv" is returned.


Install from GitHub using devtools:


yespmedleon/clbes documentation built on May 10, 2019, 1:54 a.m.