topic_score: The Topic SCORE algorithm

Description Usage Arguments Value Author(s) References Examples

Description

This function obtains the word-topic matrix A from the word-document matrix X through the Topic SCORE algorithm.

Usage

1
2
topic_score(K, X, K0, m, Mquantile = 0, scatterplot = FALSE,
  num_restart = 1, seed = NULL)

Arguments

K

The number of topics.

X

The p-by-n word-document matrix, with each column being a distribution over a fixed set of vocabulary. This matrix can be of class simple_triplet_matrix defined in slam package, or any other class that can be transformed to class dgRMatrix defined in Matrix package through as function in methods package.

K0

The number of greedy search steps in vertex hunting. If the value is missing it will be set to ceiling(1.5*K).

m

The number of centers in the kmeans step in vertex hunting. If the value is missing it will be set to 10*K.

Mquantile

The percentage of the quantile of the diagonal entries of matrix M, which is used to upper truncate the diagonal entries of matirx M. When it's 0, it will degenerate the case when there is no normalization. When it's 1, it means there is no truncation. Default is 0.

scatterplot

Whether a scatterplot of rows of R will be generated.

num_restart

The number of random restart in the kmeans step in vertex hunting. Default is 1.

seed

The random seed. Default value is NULL.

Value

A list containing

A_hat

The estimated p-by-K word-topic matrix.

R

The p-by-(K-1) left singular vector ratios matrix.

V

The K-by-(K-1) vertices matrix, with each row being a vertex found through the vertex hunting algorithm in the simplex formed by the rows of R.

Pi

The p-by-K convex combinations matrix, with each row being the convex combination coefficients of a row of R using V as vertices.

theta

The K0-by-(K-1) matrix of K0 potential vertices found in the greedy step of the vertex hunting algorithm.

Author(s)

Minzhe Wang

References

Ke, Z. T., & Wang, M. (2017). A new SVD approach to optimal topic estimation. arXiv preprint arXiv:1704.07016.

Examples

1
2
3
4
5
6
data("AP")
K <- 3
tscore_obj <- topic_score(K, AP)

# Visualize the result
plot(tscore_obj$R[,1], tscore_obj$R[,2])

TopicScore documentation built on June 6, 2019, 5:06 p.m.