textCluster: Cluster a Term-Document Matrix

View source: R/cluster.R

textClusterR Documentation

Cluster a Term-Document Matrix

Description

Combine documents (columns) into k clusters that have texts that are most similar based on their text distance. Documents with no terms are assigned to the last cluster.

Usage

textCluster(tdm, k, mx = 100, md = 5 * k)

Arguments

tdm

A term document matrix with terms on the rows and documents on the columns.

k

A positive integer with the number of clusters needed

mx

Maximum number of times to iterate (default 100)

md

Maximum number of documents to use for the initial setup (default 5*k).

Value

A textcluster object with three items; cluster, centroids, and size, where cluster contains a vector indicating for each column in M what cluster they have been assigned to, centroids contains a matrix with each column the centroid of a cluster, and size a named vector with the size of each cluster.

Examples

M=matrix(c(0,1,0,2,0,10,0,14,12,0,8,0,1,0,1,0),4)
colnames(M)=1:4;rownames(M)=c("A","B","C","D")
textCluster(M,2)

phm documentation built on June 8, 2022, 1:05 a.m.