A collection of tools for working with survey data from the British Election Study (BES) for statistical researchers. This package is principally designed for use by researchers in the House of Commons Library but may be useful to anyone using R for routine data analysis.
This package provides functions for generating turnout estimates from BES survey data by demographic characteristics. It is designed to make it easier for researchers, who may not frequently use BES survey data, to generate consistent and reproducible turnout estimates from a range of BES datasets.
Detailed voting behaviour by demographic characteristic is not officially collected in UK elections. To understand how different demographics voted (or not), researchers are reliant on estimates produced from opinion polls and survey data. The BES is the most comprehensive and reliable survey series.
There are principally two types of survey produced by the BES after an election: a panel dataset (commonly known as the internet panel) and a cross-sectional dataset (commonly known as the face to face panel). The internet panel is produced much sooner after an election than the face to face panel, although the latter is more robust due to a series of voter registeration validation checks.
Because there is a tendency for survey respondents to over-report whether they voted, total turnout estimates from the internet panel are often much higher than the actual known turnout result for an election. To compensate, a series of adjustments are needed to be applied to ensure estimated total turnout equals the known turnout result. The
clbes package aims to provide a consistent and reproducible way to generate turnout estimates for both the the internet and and face to face datasets.
turnout_query function can be used for either the BES internet or face to face datasets. There are four compulsory arguments to define, and five non-compulsory but when defined in varying combinations, can provide unweighted/weighted sample size and turnout estimates.
The first of the compulsory arguments is
data, and takes the name of the BES dataset you have loaded into R (it is recommended that the BES SPSS file versions are used in conjunction with the
haven package). The second is
dgraphic and is the name of the demographic characteristic variable of interest in the BES dataset.
vote takes the name of the variable which captures whether the BES survey respondent said they voted in the election of interest. The fourth compulsory argument is
wt which takes the name of the weighting variable to be used.
The codebooks accompanying BES datasets should be read for descriptions on variables available, and in particular, which weights should be used. There are different weights for the internet and face to face datasets.
If you are using a BES dataset where voter validation checks have not been carried out (typically the BES internet datasets) then the argument
validated should be returned as FALSE, and the argument
result must take a floating point numeric between 0 and 1 indicating the known turnout percentage for the whole of Great Britain in the election of interest - e.g. 0.688 to indicate a 68.8% turnout rate.
validated is FALSE and
result is provided a floating point numeric, the function filters your loaded BES dataset to show only survey respondents who are over the age of 18 and say they are registered to vote. Additionally, any "Don't Know" survey responses are removed. The turnout estimates generated by the weighted sample are then adjusted by
result so that the total estimated turnout equals
result. This goes someway of accounting for the over-reporting of turnout by survey respondents, although it assumes the rate of over-reporting is equal across survey respondents.
If your dataset has been validated (usually the face to face datasets) and you are using the appropriate validation weights as specified in the BES codebooks, then
result should be left as their defaults.
If you want to show the sample size (unweighted or weighted) for the generated turnout estimates, then the argument
percent must be FALSE (this stops the function from calculating percentages), and
sample either FALSE (its default) or TRUE.
percent are FALSE the weighted sample size is returned. If
sample is TRUE and
percent is FALSE then the unweighted sample size is returned. If
pecent is not set to FALSE then sample sizes will not be shown.
To show sample sizes for unvalidated data (
validated is FALSE), then
result should be set to its default, 0. This is to stop all sample sizes being multiplied by
By default, all turnout estimates and sample sizes are returned as tibbles in the console. However, if
write is TRUE then results are returned as a CSV named "turnout_
dgraphic.csv" to the working directory.
turnout_common function is a simpler version of
turnout_query, although it is written to work specifically with the latest BES 2017 face to face survey only. The principle behind
turnout_common is to provide researchers turnout estimates for commonly queried demographics from the latest BES face to face survey. There is no need to supply a BES dataset to the function as this is done automatically whenever the function is called.
dgraphic is much the same as in
turnout_query although this must take one of five strings between quotation marks: age, gender, ethnicity, religion and education. Each string relates to a demographic characteristic within the BES face to face survey and automatically recodes and condenses variables where appropriate to create larger sample sizes per characteristic (e.g. ethnicity recodes the y11 variable from its 18 possible values to four).
write arguments behave in the same way as in
turnout_query. The only difference being is that when
write is true a CSV named "BES_2017_FTF_turnout_
dgraphic.csv" is returned.
Install from GitHub using devtools:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.