medSTC: A max-margin Sparse Topical Coding model (Med-STC) for...

Description Usage Arguments Value Author(s) References Examples

View source: R/medSTCSource.R

Description

MedSTC is a novel classification algorithm by Prof. Jun Zhu (http://www.ml-thu.net/~jun/).

Usage

1
2
3
medSTC(documents, mlabels, ntopics, initial_c=0.5, lambda=1, rho=0.01, delta_ell=3600, supervised=TRUE, 
primal_svm=1, var_max_iter=20, convergence=1e-4, em_max_iter=100, em_convergence=1e-4, 
svm_alg_type=2, output_dir=".") 

Arguments

documents

A list whose length is equal to the number of documents, D. Each element of documents is an integer matrix with two rows. Each column of documents[[i]] (i.e., document i) represents a word occurring in the document.

documents[[i]][1, j] is a 0-indexed word identifier for the jth word in document i. documents[[i]][2,j] is an integer specifying the number of times that word appears in the document.

mlabels

The training labels for the documents.

ntopics

Number of topics to be used in modeling the corpus.

initial_c, lambda, rho

These are positive-valued regularization constants. Default values are initial_c=0.5, lambda=0.1, rho=0.01

delta_ell

The parameter for the svm cost function, i.e., 0/(delta ell) loss. Only positive values are allowed. Default value is 3600.

supervised

If the value is TRUE, the model is a supervised MedSTC; if FALSE, the model is the unsupervised STC.

primal_svm

Only works when "supervised" is set at 1. If the value is 1, uses the loss-augmented prediction (i.e., sub-gradient) to update document codes; otherwise it uses the gradient with Lagrangian multipliers to update document codes.

var_max_iter

The maximum number of iterations of coordinate descent for a single document.

convergence

The convergence criteria for coordinate descent. Stop if (objective_old - objective) / abs(objective_old) is less than this value (or after the maximum number of iterations). Note that "objective" is the objective value for a single document.

em_max_iter

The maximum number of iterations of hierarchical sparse coding, dictionary learning, and svm training (for supervised MedSTC).

em_convergence

The convergence criteria for coordinate descent. Stop if (objective_old - objective) / abs(objective_old) is less than this value (or after the maximum number of iterations). Note that "objective" is the objective value for the whole corpus.

svm_alg_type

If set to 0 then the n-slack multi-class SVM is used. If set to 2, then the 1-slack multi-class SVM is used. In our testing, the 1-slack SVM is faster.

output_dir

A directory for writing intermediate results. Directory is removed after the calculation is done, but is needed during the run.

Value

model

A model object of the medSTC class, which has a state list with five elements: The first two list elements are for storing the model parameter state after the model completed training. The third list element is the LogProbabilityOfWordsForTopics, which can be used for word assignments to topics. The fourth and fifth model state list elements are Eta and Mu. (refer to paper) The model also stores the original paramater values.

Author(s)

Jun Zhu (junzhu@cs.cmu.edu),Aykut Firat (aykutfirat@gmail.com)

References

Jun Zhu, and Eric P. Xing. Sparse Topical Coding, In Proc. of 27th Conference on Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain, 2011.

Examples

1
## Not run: demo(medSTC)

medSTC documentation built on May 29, 2017, 5:13 p.m.