weight the feature frequencies in a dfm
Returns a document by feature matrix with the feature frequencies weighted according to one of several common methods.
1 2 3 4 5 6 7 8 9 10
document-feature matrix created by dfm
a label of the weight type, or a named numeric vector of values to apply to the dfm. One of:
not currently used. For finer grained control, consider calling
constant added to the dfm cells for smoothing, default is 1
This converts a matrix from sparse to dense format, so may exceed memory requirements depending on the size of your input matrix.
The dfm with weighted values.
Paul Nulty and Kenneth Benoit
Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schutze. Introduction to Information Retrieval. Vol. 1. Cambridge: Cambridge University Press, 2008.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
dtm <- dfm(inaugCorpus) x <- apply(dtm, 1, function(tf) tf/max(tf)) topfeatures(dtm) normDtm <- weight(dtm, "relFreq") topfeatures(normDtm) maxTfDtm <- weight(dtm, type="relMaxFreq") topfeatures(maxTfDtm) logTfDtm <- weight(dtm, type="logFreq") topfeatures(logTfDtm) tfidfDtm <- weight(dtm, type="tfidf") topfeatures(tfidfDtm) # combine these methods for more complex weightings, e.g. as in Section 6.4 # of Introduction to Information Retrieval head(logTfDtm <- weight(dtm, type="logFreq")) head(tfidf(logTfDtm, normalize = FALSE)) # apply numeric weights str <- c("apple is better than banana", "banana banana apple much better") weights <- c(apple = 5, banana = 3, much = 0.5) (mydfm <- dfm(str, ignoredFeatures = stopwords("english"), verbose = FALSE)) weight(mydfm, weights)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.