Description Author(s) Examples
This package aims to help accomplish the basic tasks of Chinese text mining
in a more efficient way. The manual in Chinese is
in https://github.com/githubwwwjjj/chinese.misc.
Compared with other packages and functions, the package puts more weight
on the following three points:
(1) It helps save users' time.
(2) It helps decrease errors (it tolerates and corrects input errors, if it can;
and if it cannot, it gives meaningful error messages).
(3) Although the functions in this package depend on tm and
stringi, several steps and the values of arguments have been
specially set to facilitate processing Chinese text.
For example, corp_or_dtm
creates corpus or
document term matrix, users only need to input folder names or file names, and the function
will automatically detect file encoding, segment terms, modify texts,
remove stop words.
txt2csv
and csv2txt
help convert the format of texts and do some data
cleaning. And there are some functions for object class assertion and coercion.
Jiang Wu
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | require(tm)
# Since no Chinese character is allowed, here we
# use English instead.
# Make a document term matrix in 1 step, few arguments have
# to be modified by the user.
x <- c(
"Hello, what do you want to drink?",
"drink a bottle of milk",
"drink a cup of coffee",
"drink some water",
"hello, drink a cup of coffee")
dtm <- corp_or_dtm(x, from = "v", type = "dtm")
# Coerce list containing data frames and other lists
df <- data.frame(matrix(c(66, 77, NA, 99), nr = 2))
l <- list(a = 1:4, b = factor(c(10, 20, NA, 30)), c = c('x', 'y', NA, 'z'), d = df)
l2 <- list(l, l, cha = c('a', 'b', 'c'))
as.character2(l2)
|
Loading required package: tm
Loading required package: NLP
CHECKING ARGUMENTS
PROCESSING CHARACTER VECTOR
GENERATING CORPUS
PROCESSING CORPUS
MAKING DTM/TDM
DONE
Warning messages:
1: In Sys.setlocale(category = "LC_COLLATE", s_right_locale) :
OS reports request to set locale to "zh_CN.UTF-8" cannot be honored
2: In Sys.setlocale(category = "LC_CTYPE", s_right_locale) :
OS reports request to set locale to "zh_CN.UTF-8" cannot be honored
3: In tm_map.SimpleCorpus(corp, tm::removePunctuation) :
transformation drops documents
4: In tm_map.SimpleCorpus(corp, tm::removeNumbers) :
transformation drops documents
5: In tm_map.SimpleCorpus(corp, tm::content_transformer(tolower)) :
transformation drops documents
6: In tm_map.SimpleCorpus(corp, tm::stripWhitespace) :
transformation drops documents
[1] "1" "2" "3" "4" "10" "20" NA "30" "x" "y" NA "z" "66" "77" NA
[16] "99" "1" "2" "3" "4" "10" "20" NA "30" "x" "y" NA "z" "66" "77"
[31] NA "99" "a" "b" "c"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.