region_matrix_ops: Get IDs and Counts for Region Matrices.

Description Usage Arguments Examples

Description

Get IDs and Counts for Region Matrices.

Usage

1
2
3
4
5
region_matrix_to_ids(corpus, p_attribute,
  registry = Sys.getenv("CORPUS_REGISTRY"), matrix)

region_matrix_to_count_matrix(corpus, p_attribute,
  registry = Sys.getenv("CORPUS_REGISTRY"), matrix)

Arguments

corpus

a CWB corpus

p_attribute

a positional attribute

registry

registry directory

matrix

a regions matrix

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
registry <- if (!check_pkg_registry_files()) use_tmp_registry() else get_pkg_registry()

# Scenario 1: Get full text for a subcorpus defined by regions
m <- get_region_matrix(
  corpus = "REUTERS", s_attribute = "places",
  strucs = 4L:5L, registry = registry
  )
ids <- region_matrix_to_ids(
  corpus = "REUTERS", p_attribute = "word",
  registry = registry, matrix = m
  )
tokenstream <- cl_id2str(
  corpus = "REUTERS", p_attribute = "word",
  registry = registry, id = ids
  )
txt <- paste(tokenstream, collapse = " ")
txt

# Scenario 2: Get data.frame with counts for region matrix
y <- region_matrix_to_count_matrix(
  corpus = "REUTERS", p_attribute = "word",
  registry = registry, matrix = m
  )
df <- as.data.frame(y)
colnames(df) <- c("token_id", "count")
df[["token"]] <- cl_id2str(
  "REUTERS", p_attribute = "word",
  registry = registry, id = df[["token_id"]]
  )
df[order(df[["count"]], decreasing = TRUE),]
head(df)

RcppCWB documentation built on Oct. 22, 2018, 5:08 p.m.