data_corpus_manifestosentsUK | R Documentation |
A text corpus of sentences from publicly available party manifestos from the United Kingdom, published between 1945 and 2019 Some manifestos sentences have been rated in terms of the direction of policy using crowd-sourced coders.
The manifestos from the three main parties (Labour Party, Conservatives, Liberal Democrats) between 1987 and 2010 have been labelled as Economic Policy, Social Policy, or Other, and rated in terms of the direction of Economic Policy and Social Policy. All party manifestos from the 2010 General Election have been crowd-coded in terms of immigration policy, and the direction of immigration policy. For more information on the coding approach see Benoit et al. (2016).
The
corpus contains the aggregated crowd coding values on the level of sentences.
Note that the segmentation into sentences does not always work correctly due
to missing punctuation. See Examples for how to remove very short and very
long sentences using quanteda::corpus_trim()
.
data_corpus_manifestosentsUK
The corpus consists of 88,954 documents (i.e. sentences) and includes the following document-level variables:
factor; abbreviation of the party that wrote the manifesto.
factor; party that wrote the manifesto.
integer; 4-digit year of the election.
factor; indicates the majority label assigned
by crowd workers (Economic Policy, Social Policy, or Neither). The variable
has missing values (NA
) for all non-annotated manifestos.
numeric; the direction of statements coded as "Economic Policy" or "Social Policy" based on the aggregated crowd codings. The variable is the mean of the scores assigned by the workers workers who coded the sentence and who allocated the sentence to the "majority" category. The variable ranges from -2 to +2.
For the statements aggregated as "Economic" Policy, -2 corresponds to "Very left"; +2 corresponds to "Very right". For the statements aggregated as "Social Policy" -2 corresponds to "Very liberal"; +2 corresponds to "Very conservative". The variable has missing values (NA) for all sentences that were aggregated as "Neither" and for all non-annotated manifestos.)
integer; the number of coders who contributed to the
mean score crowd_econsocial_mean
.
Factor indicating whether the majority of
crowd workers labelled a sentence as referring to immigration or not. The
variable has missing values (NA
) for all non-annotated manifestos.
numeric; the direction
of statements coded as "Immigration" based on the aggregated crowd codings.
The variable is the mean of the scores assigned by workers who coded a
sentence and who allocated the sentence to the "Immigration" category. The
variable ranges from -1 ("Negative and closed immigration policy") to +1
(Favorable and open immigration policy). The variable has missing values
(NA
) for all non-annotated manifestos or if a sentence was not coded as
referring to immigration policy based on the aggregation of crowd codings.
integer; the number of coders who
contributed to the
mean score crowd_immigration_mean
.
A corpus object.
Benoit, K., Conway, D., Lauderdale, B.E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. American Political Science Review, 100,(2), 278–295.
library("quanteda")
# keep only crowd coded manifestos (with respect to economic and social policy)
corp_crowdeconsocial <-
corpus_subset(data_corpus_manifestosentsUK, !is.na(crowd_econsocial_label))
# keep only crowd coded manifestos (with respect to immigration policy)
corp_crowdimmig <-
corpus_subset(data_corpus_manifestosentsUK, !is.na(crowd_immigration_label))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.