data_corpus_manifestosentsUK: Sentence-level corpus of UK party manifestos 1945-2017,...

data_corpus_manifestosentsUKR Documentation

Sentence-level corpus of UK party manifestos 1945–2017, partially annotated

Description

A text corpus of sentences from publicly available party manifestos from the United Kingdom, published between 1945 and 2019 Some manifestos sentences have been rated in terms of the direction of policy using crowd-sourced coders.

The manifestos from the three main parties (Labour Party, Conservatives, Liberal Democrats) between 1987 and 2010 have been labelled as Economic Policy, Social Policy, or Other, and rated in terms of the direction of Economic Policy and Social Policy. All party manifestos from the 2010 General Election have been crowd-coded in terms of immigration policy, and the direction of immigration policy. For more information on the coding approach see Benoit et al. (2016).

The corpus contains the aggregated crowd coding values on the level of sentences. Note that the segmentation into sentences does not always work correctly due to missing punctuation. See Examples for how to remove very short and very long sentences using quanteda::corpus_trim().

Usage

data_corpus_manifestosentsUK

Format

The corpus consists of 88,954 documents (i.e. sentences) and includes the following document-level variables:

party

factor; abbreviation of the party that wrote the manifesto.

partyname

factor; party that wrote the manifesto.

year

integer; 4-digit year of the election.

crowd_econsocial_label

factor; indicates the majority label assigned by crowd workers (Economic Policy, Social Policy, or Neither). The variable has missing values (NA) for all non-annotated manifestos.

crowd_econsocial_mean

numeric; the direction of statements coded as "Economic Policy" or "Social Policy" based on the aggregated crowd codings. The variable is the mean of the scores assigned by the workers workers who coded the sentence and who allocated the sentence to the "majority" category. The variable ranges from -2 to +2.

For the statements aggregated as "Economic" Policy, -2 corresponds to "Very left"; +2 corresponds to "Very right". For the statements aggregated as "Social Policy" -2 corresponds to "Very liberal"; +2 corresponds to "Very conservative". The variable has missing values (NA) for all sentences that were aggregated as "Neither" and for all non-annotated manifestos.)

crowd_econsocial_n

integer; the number of coders who contributed to the mean score crowd_econsocial_mean.

crowd_immigration_label

Factor indicating whether the majority of crowd workers labelled a sentence as referring to immigration or not. The variable has missing values (NA) for all non-annotated manifestos.

crowd_immigration_mean

numeric; the direction of statements coded as "Immigration" based on the aggregated crowd codings. The variable is the mean of the scores assigned by workers who coded a sentence and who allocated the sentence to the "Immigration" category. The variable ranges from -1 ("Negative and closed immigration policy") to +1 (Favorable and open immigration policy). The variable has missing values (NA) for all non-annotated manifestos or if a sentence was not coded as referring to immigration policy based on the aggregation of crowd codings.

crowd_immigration_n

integer; the number of coders who contributed to the mean score crowd_immigration_mean.

A corpus object.

References

Benoit, K., Conway, D., Lauderdale, B.E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. American Political Science Review, 100,(2), 278–295.

Examples


library("quanteda")

# keep only crowd coded manifestos (with respect to economic and social policy)
corp_crowdeconsocial <-
    corpus_subset(data_corpus_manifestosentsUK, !is.na(crowd_econsocial_label))

# keep only crowd coded manifestos (with respect to immigration policy)
corp_crowdimmig <-
    corpus_subset(data_corpus_manifestosentsUK, !is.na(crowd_immigration_label))


quanteda/quanteda.classifiers documentation built on Oct. 20, 2023, 6:53 a.m.