CTDM: Term Document Matrix

Description Usage Arguments Details Author(s) Examples

Description

Constructs Term-Document Matrix from Chinese Text Documents.

Usage

1
2
CTDM(doc, weighting, EngTermDeleted = TRUE, NumTermDeleted = TRUE,
  shortTermDeleted = TRUE)

Arguments

doc

The Chinese text document. A vector of Chinese strings.

weighting

Available weighting function with matrix are binary, count, tf, tfidf. See details.

EngTermDeleted

remove English from text documents.

NumTermDeleted

remove Numbers from text documents.

shortTermDeleted

Deltected short word when nchar <2.

Details

This function run a Chinese word segmentation by jiebeR and build term-document matrix, and there is four weighting function with matrix, and "binary" means value can only be 1 if the term occurs, "count" means how many times the term occurs in a doc, "tf" means term frequency and "tfidf" means term frequency inverse document frequency.

Author(s)

Jim Liu, Quan Gu

Examples

1
2
3
4
5
6
7
library(CTM)
a1 <- "hello taiwan"
b1 <- "world of tank"
c1 <- "taiwan weather"
d1 <- "local weather"
text1 <- t(data.frame(a1,b1,c1,d1))
tdm1 <- CTDM(doc = text1, weighting = "tfidf", EngTermDeleted = FALSE, shortTermDeleted = FALSE)

CTM documentation built on May 1, 2019, 8:07 p.m.

Related to CTDM in CTM...