Description Usage Arguments Examples
Encode CWB Corpus.
1 2 3 4 5 6 7 8 9 | encode(.Object, ...)
## S4 method for signature 'data.frame'
encode(.Object, name, pAttributes = "word",
sAttributes = NULL, registry = Sys.getenv("CORPUS_REGISTRY"),
indexedCorpusDir = NULL, verbose = TRUE)
## S4 method for signature 'data.table'
encode(.Object, corpus, sAttribute)
|
.Object |
a data.frame to encode |
... |
further parameters |
name |
name of the (new) CWB corpus |
pAttributes |
columns of .Object with tokens (such as word/pos/lemma) |
sAttributes |
columns of .Object that will be encoded as structural attributes |
registry |
path to the corpus registry |
indexedCorpusDir |
directory where to create directory for indexed corpus files |
verbose |
logical, whether to be verbose |
corpus |
the name of the CWB corpus |
sAttribute |
a single s-attribute |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ## Not run:
library(tm)
library(tibble)
library(tidytext)
library(plyr)
reut21578 <- system.file("texts", "crude", package = "tm")
reuters.tm <- VCorpus(DirSource(reut21578), list(reader = readReut21578XMLasPlain))
reuters.tibble <- tidy(reuters.tm)
# reuters.tibble[["topics_cat"]] <- sapply(
reuters.tibble[["topics_cat"]],
function(x) paste(x, collapse = "|")
)
reuters.tibble[["places"]] <- sapply(
reuters.tibble[["places"]],
function(x) paste(x, collapse = "|")
)
reuters.tidy <- unnest_tokens(
reuters.tibble, output = "word", input = "text", to_lower = FALSE
)
encode(reuters.tidy, name = "reuters", sAttributes = c("language", "places"))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.