approx_k: Approximate Number of Clusters for a Text Matrix

Description Usage Arguments Value References Examples

Description

Can & Ozkarahan (1990) formula for approximating the number of clusters for a text matrix: (m * n)/t where m and n are the dimensions of the matrix and t is the length of the non-zero elements in matrix A.

Usage

1
2
3
4
5
6
7
approx_k(x, verbose = TRUE)

## S3 method for class 'TermDocumentMatrix'
approx_k(x, verbose = TRUE)

## S3 method for class 'DocumentTermMatrix'
approx_k(x, verbose = TRUE)

Arguments

x

A matrix.

verbose

logical. If TRUE the k determination is printed.

Value

Returns an integer.

References

Can, F., Ozkarahan, E. A. (1990). Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases. ACM Transactions on Database Systems 15 (4): 483. doi:10.1145/99935.99938.

Examples

1
2
3
4
5
6
library(gofastr)
library(dplyr)

presidential_debates_2012 %>%
    with(q_dtm(dialogue)) %>%
    approx_k()

trinker/clustext documentation built on May 31, 2019, 8:41 p.m.