split_by_doc_id: Turn a kRp.corpus object into a list of kRp.text objects

Description Usage Arguments Value Examples

Description

For some analysis steps it might be important to have individual tagged texts instead of one large corpus object. This method produces just that.

Usage

1
2
## S4 method for signature 'kRp.corpus'
split_by_doc_id(obj, keepFeatures = TRUE)

Arguments

obj

An object of class kRp.corpus.

keepFeatures

Either logical, whether to keep all features or drop them, or a character vector of names of features to keep if present.

Value

A named list of objects of class kRp.text. Elements are named by their doc_id.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# use readCorpus() to create an object of class kRp.corpus
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  myCorpus <- readCorpus(
    dir=file.path(path.package("tm.plugin.koRpus"), "examples", "corpus"),
    hierarchy=list(
      Topic=c(
        Winner="Reality Winner",
        Edwards="Natalie Edwards"
      ),
      Source=c(
        Wikipedia_prev="Wikipedia (old)",
        Wikipedia_new="Wikipedia (new)"
      )
    ),
    # use tokenize() so examples run without a TreeTagger installation
    tagger="tokenize",
    lang="en"
  )

  myCorpusList <- split_by_doc_id(myCorpus)
} else {}

unDocUMeantIt/tm.plugin.koRpus documentation built on May 21, 2021, 9:27 p.m.