enhance_manifesto_df: Enhance manifesto data frame

Description Usage Arguments Value Note Examples

View source: R/enhance_manifesto_df.R

Description

Functon takes a manifesto data frame (see as_tibble.ManifestoCorpus and as_tibble.ManifestoDocument) and enhances it with quasi-sentence, sentence, and bloc counters as well as a role indicator distinguishing quasi-sentence text (value 'qs'), from title, header and meta text.

This text-level information is infered from columns 'text' and 'cmp_code'.

Usage

1

Arguments

x

A manifesto data frame with the two required columns: 'text' and 'cmp_code'

Value

The input x as manifesto.df object (inherits from tibble), enhanced by column 'qs_nr' (running quasi-sentence counter), 'sent_nr' (running sentence counter), 'role' (indicator, here 'qs' for all rows), and 'bloc_nr' (enumerates consecutive rows by 'role')

In addition, the returned manifesto.df obejct has two attributes:

  1. 'annotated': indicates wehtehr or not the input manifesto has been annotated/coded by CMP experts.

  2. 'extra_cols': names of columns added by enhancing the input data frame.

Note

As one natrual sentence may contain multiple quasi-sentences, the latter map m:1 to the former.

For each row, the indicator variable 'role' may assume either of four values:

  1. 'qs': quasi-sentence

  2. 'title': the first row(s) with CMP code 'H' or NA (only in annotated manifestos)

  3. 'header': subsequent rows with CMP code 'H' or NA (only in annotated manifestos)

  4. 'meta': in annotated manifestos containing 'H' codes, the row(s) between 'title' and the first 'header' rows

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Not run: 
library(tibble)
library(manifestoEnhanceR)

man <- tribble(
  ~manifesto_id, ~text, ~cmp_code,
  "123", "main title", "H",
  "123", "sub title", "H",
  "123", "Publisher etc", NA_character,
  "123", "first section", "H",
  "123", "This is the first full sentence.", "000",
  "123", "This is the second,", "000",
  "123", "but splitted sentence.", "000",
  "123", "second section", "H",
  "123", "This is the third sentence.", "000"
)

enhanced <- enhance_manifesto_df(man)
class(enhanced)
nrow(man) == nrow(enhanced)
ncol(man) < ncol(enhanced)
attr(enhanced, "annotated")
attr(enhanced, "extra_cols")

## End(Not run)

haukelicht/manifestoEnhanceR documentation built on March 30, 2020, 3:15 a.m.