region_matrix_ops | R Documentation |
Get IDs and Counts for Region Matrices.
region_matrix_to_ids(
corpus,
p_attribute,
registry = Sys.getenv("CORPUS_REGISTRY"),
matrix
)
region_matrix_to_count_matrix(
corpus,
p_attribute,
registry = Sys.getenv("CORPUS_REGISTRY"),
matrix
)
region_matrix_context(
corpus,
registry = Sys.getenv("CORPUS_REGISTRY"),
matrix,
p_attribute,
s_attribute,
boundary,
left,
right
)
ranges_to_cpos(ranges)
corpus |
a CWB corpus |
p_attribute |
a positional attribute |
registry |
registry directory |
matrix |
a regions matrix |
s_attribute |
If not |
boundary |
Structural attribute (length-one |
left |
An |
right |
An |
ranges |
A two-column integer |
ranges_to_cpos()
will turn a matrix
of ranges into an integer
vector with the individual corpus positions covered by the ranges.
# Scenario 1: Get full text for a subcorpus defined by regions
m <- get_region_matrix(
corpus = "REUTERS", s_attribute = "places",
strucs = 4L:5L, registry = get_tmp_registry()
)
ids <- region_matrix_to_ids(
corpus = "REUTERS", p_attribute = "word",
registry = get_tmp_registry(), matrix = m
)
tokenstream <- cl_id2str(
corpus = "REUTERS", p_attribute = "word",
registry = get_tmp_registry(), id = ids
)
txt <- paste(tokenstream, collapse = " ")
txt
# Scenario 2: Get data.frame with counts for region matrix
y <- region_matrix_to_count_matrix(
corpus = "REUTERS", p_attribute = "word",
registry = get_tmp_registry(), matrix = m
)
df <- as.data.frame(y)
colnames(df) <- c("token_id", "count")
df[["token"]] <- cl_id2str(
"REUTERS", p_attribute = "word",
registry = get_tmp_registry(), id = df[["token_id"]]
)
df[order(df[["count"]], decreasing = TRUE),]
head(df)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.