word_cor: Word Correlation in DTM/TDM

Description Usage Arguments Value Examples

View source: R/word_cor.R

Description

Given a DTM/TDM/matrix, the function computes the pearson/spearman/kendall correlation between pairs of words and filters the values by p value and minimum value of correlation. It is a little more flexible than tm::findAssocs.

Usage

1
word_cor(x, word, type = "dtm", method = "kendall", p = NULL, min = NULL)

Arguments

x

a DocumentTermMatrix, TermDocumentMatrix object, or a matrix. If it is a matrix, you must specify its type by the argument type. If it is a matrix, NA is not allowed, and rownames/colnames that are taken as words should not be NULL.

word

a character vector of words that you want to know their correlation in you data. If it is not a vector, the function will try to coerce. The length of it should not larger than 200. The function only computes for words that do exist in data, and those not in data will not be included.

type

if it starts with "d/D", it represents a DTM; if with "t/T", TDM; others are not valid. This is only used when x is a matrix. The default is "dtm".

method

what index is to be computed? It can only be "pearson", "spearman", or "kendall" (default). The method is passed to stats::cor.test. The default is "kendall".

p

if the p value of a correlation index is >= this value, the index will be convert to NA in the correlation matrix. The default is NULL, which means no filter is done. Note: if both argument p and min are non-Null, their relation is "or" rather than "and".

min

if the correlation index is smaller than this value, it will be convert to NA. The default is NULL, which means no filter is done.

Value

a list. The 1st element is the correlation matrix with diagonal converted to NA. The 2nd element is the p value matrix with diagonal converted to NA.

Examples

1
2
3
4
5
6
7
8
9
set.seed(1)
s <- sample(1:10, 100, replace = TRUE)
m <- matrix(s, nrow = 20)
myword<- c("alpha", "apple", "cake", "data", "r")
colnames(m) <- myword
mycor1 <- word_cor(m, myword)
mycor2 <- word_cor(m, myword, method = "pearson", min = 0.1, p = 0.4)
mt <- t(m)
mycor3 <- word_cor(mt, myword, type = "T", method = "spearman", p = 0.5)

Example output

Warning messages:
1: In Sys.setlocale(category = "LC_COLLATE", s_right_locale) :
  OS reports request to set locale to "zh_CN.UTF-8" cannot be honored
2: In Sys.setlocale(category = "LC_CTYPE", s_right_locale) :
  OS reports request to set locale to "zh_CN.UTF-8" cannot be honored
Warning messages:
1: In Sys.setlocale(category = "LC_COLLATE", s_right_locale) :
  OS reports request to set locale to "zh_CN.UTF-8" cannot be honored
2: In Sys.setlocale(category = "LC_CTYPE", s_right_locale) :
  OS reports request to set locale to "zh_CN.UTF-8" cannot be honored
Warning messages:
1: In Sys.setlocale(category = "LC_COLLATE", s_right_locale) :
  OS reports request to set locale to "zh_CN.UTF-8" cannot be honored
2: In Sys.setlocale(category = "LC_CTYPE", s_right_locale) :
  OS reports request to set locale to "zh_CN.UTF-8" cannot be honored

chinese.misc documentation built on Sept. 13, 2020, 5:13 p.m.