In PolMine/fulltext: Toolset to generate fulltext display from corpus data

Purpose

The ´fulltext´-package includes a lightweight htmlwidget for fulltext output.

Getting Started

library(janeaustenr)
library(tokenizers)
library(fulltext)
library(polmineR)
library(magrittr)
library(data.table)

From tidytext to fulltext

ftxt_list <- cut(1:length(emma), c(1, grep("^\\s*$", emma), length(emma))) %>%
  split(janeaustenr::emma, f = .) %>%
  lapply(paste, collapse = " ") %>%
  tokenizers::tokenize_words(lowercase = FALSE, strip_punct = FALSE) %>%
  as.fulltexttable() %>% 
  split(ftxt, column = "tag_before", regex = "<para") %>%
  retag(regex = "CHAPTER", old = "para", new = "h2") %>%
  as.fulltexttable() %>%
  split(column = "token", regex = "CHAPTER") %>%
  rename(name = sprintf("Chapter %d", seq_along(.)))

fulltext(ftxt_list[[1]])

From a subcorpus to fulltext

We introduce the fulltext package by example. In addition to the fulltext package, we need the polmineR package which includes the GERMAPARLMINI corpus.

library(polmineR)
use("polmineR")

The example aims at outputting one particular speech. We take a speech held by Voker Kauder in the German Bundestag.

sp <- corpus("GERMAPARLMINI") %>%
  subset(speaker == "Volker Kauder") %>%
  subset(date == "2009-11-10")

The data that is passed to the JavaScript that generates the output. Expected to be a list of lists that provide data on sections of text. Each of the sub-lists is to be a named list of a character vector with the HTML element the section will be wrapped into, and a data.frame (or a list) with a column "token", and a column "id".

ftab <- as.fulltexttable(sp, headline = "Volker Kauder (CDU)", display = "block")

So let us see the result ...

fulltext(ftab, box = TRUE)

Crosstalking

Preparations

ftxt_list <- lapply(
  setNames(names(ftxt_list), names(ftxt_list)),
  function(chapter) data.frame(ftxt_list[[chapter]], chapter = chapter)
)
ftxt_list <- ftxt_list[1:5]

library(crosstalk)
austen_chapters <- do.call(rbind, ftxt_list)
austen_chapters[["tag_before"]] <- gsub("display:block", "display:none", austen_chapters[["tag_before"]])
sd <- crosstalk::SharedData$new(austen_chapters, ~chapter, group = "fulltext")
chapters_table <- data.frame(chapter = unique(austen_chapters$chapter))
chapters_table_sd <- crosstalk::SharedData$new(chapters_table, ~chapter, group = "fulltext")

y <- bscols(
  # widths = c(NA,NA),
  DT::datatable(
    chapters_table_sd,
    options = list(lengthChange = TRUE, pageLength = 8L, pagingType = "simple", dom = "tp"),
    rownames = NULL, width = "100%", selection = "single"
  ),
  fulltext(sd, width = "100%", box = TRUE)
)