Description Usage Arguments Examples
This function returns an eztfidf list containing convenient functions.
1 |
char_vector |
A character vector of documents. To be passed as a VectorSource (tm package). The values may be duplicated but the names may not. |
replace_words |
A named character vector. The element names will be replaced with the elements. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | super_heroes <- c(
'The Flash', 'The HULK', 'she-hulk', 'ant-man', 'Ironman', 'BATMAN',
'superman', 'the green arrow', 'aqua-man', 'the silver surfer', 'green lantern'
)
names(super_heroes) <- super_heroes
super_heroes <- gsub('man$', '-MAN', super_heroes, TRUE) # custom cleaning
x <- eztfidf(
super_heroes, replace_words = c('-' = ' ', 'silver' = 'gold')
)
# Use numeric index or original names to see changes to docs
x$docs[1:10]
x$docs[c('The HULK','the silver surfer')]
# Inspect bag-of-words tfidf values as a list or matrix
x$values(c('the green arrow','green lantern'))
x$values(c(2,3,8,11), mode = 'matrix')
# Best matching values and cosine similarity matrix easily accessible
x$CosineSimVector(3, top = 3)
x$CosineSimVector('the green arrow', top = 3)
x$CosineSimMatrix(c(2,3,8,11))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.