To test the performance of similarity calculation using vectorized method versus c loop methods.
require(quanteda, quietly = TRUE, warn.conflicts = FALSE) data(SOTUCorpus, package = "quantedaData") SOTUDfm <- dfm(SOTUCorpus, remove = stopwords("english"), stem = TRUE, verbose = FALSE) microbenchmark::microbenchmark(eucQuanteda = textstat_dist(SOTUDfm, method = "euclidean", margin = "documents"), eucProxy = proxy::dist(as.matrix(SOTUDfm), "euclidean", diag = FALSE, upper = FALSE, p = 2), eucStats = dist(as.matrix(SOTUDfm), method = "euclidean", diag = FALSE, upper = FALSE, p = 2), times = 20, unit = "relative")
require(quanteda, quietly = TRUE, warn.conflicts = FALSE) data(SOTUCorpus, package = "quantedaData") SOTUDfm <- dfm(SOTUCorpus, remove = stopwords("english"), stem = TRUE, verbose = FALSE) microbenchmark::microbenchmark(jacQuanteda = textstat_simil(SOTUDfm, method ="jaccard", margin = "documents", upper = TRUE), jacProxy = proxy::simil(as.matrix(SOTUDfm), "jaccard", diag = FALSE, upper = FALSE), times = 10, unit = "relative")
To test the performance of similarity calculation using parallel method versus serial method.
require(quanteda, quietly = TRUE, warn.conflicts = FALSE) data(SOTUCorpus, package = "quantedaData") SOTUDfm <- dfm(SOTUCorpus, remove = stopwords("english"), stem = TRUE, verbose = FALSE) microbenchmark::microbenchmark(manhaQuanteda = textstat_dist(SOTUDfm, method ="manhattan", margin = "documents", upper = FALSE), manhaProxy = proxy::dist(as.matrix(SOTUDfm), "Manhattan", diag = FALSE, upper = FALSE), times = 10, unit = "relative")
load("/home/kohei/Documents/Brexit/Analysis/data_dfm_guardian.RData") ndoc(data_dfm_guardian) gud_dist <- textstat_dist(data_dfm_guardian[1:100], method ="manhattan", margin = "documents", upper = FALSE) summary(gud_dist)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.