germaparl_topics: Use topicmodels prepared for GermaParl.

Description Usage Arguments Details Value Examples

Description

A set of LDA topicmodels is part of the Zenodo release of GermaParl (k between 100 and 450). These topic models can be downloaded using germaparl_download_lda and loaded using germaparl_load_lda.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
germaparl_download_lda(
  k = c(100L, 150L, 175L, 200L, 225L, 250L, 275L, 300L, 350L, 400L, 450L),
  doi = "10.5281/zenodo.3742113",
  data_dir,
  sample = FALSE,
  verbose = TRUE
)

germaparl_load_lda(
  k,
  registry_dir = cwbtools::cwb_registry_dir(),
  verbose = TRUE,
  sample = FALSE
)

Arguments

k

A numeric or integer vector, the number of topics of the topicmodel. Multiple values can be provided to download several topic models at once.

doi

The DOI of GermaParl at Zenodo.

data_dir

The data directory with the binary files of the GERMAPARL corpus. If missing, the directory will be guessed using the function cwb::cwb_corpus_dir

sample

A logical value, if TRUE, use GERMAPARLSAMPLE corpus rather than GERMAPARL.

verbose

logical

registry_dir

The registry directory where the registry file for GERMAPARL is located.

Details

The function germaparl_download_lda will download an rds-file that will be stored in the data directory of the GermaParl corpus.

germaparl_load_lda will load a topicmodel into memory. The function will return a LDA_Gibbs topicmodel, if the topicmodel for k is present; NULL if the topicmodel has not yet been downloaded.

Value

The function germaparl_download_lda will (invisibly) return TRUE if the operation has been succesful and FALSE if not.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# This example assumes that the directories used by the CWB do not yet exist, so
# temporary directories are created.
cwb_dirs <- cwbtools::create_cwb_directories(prefix = tempdir(), ask = FALSE)

samplemode <- TRUE
corpus_id <- "GERMAPARLSAMPLE" # for full corpus: corpus_id <- "GERMAPARL"

dir.create(file.path(cwb_dirs[["corpus_dir"]], tolower(corpus_id)))

# Download topic model
germaparl_download_lda(
  k = 30, # k = 250 recommended for full GERMAPARL corpus
  data_dir = file.path(cwb_dirs[["corpus_dir"]], tolower(corpus_id)),
  sample = samplemode
)
lda <- germaparl_load_lda(
  k = 30L, registry_dir = cwb_dirs[["registry_dir"]],
  sample = samplemode
)
lda_terms <- topicmodels::terms(lda, 10)

GermaParl documentation built on Oct. 23, 2020, 8:27 p.m.