dfm_weight | R Documentation |
Weight the feature frequencies in a dfm
dfm_weight(
x,
scheme = c("count", "prop", "propmax", "logcount", "boolean", "augmented", "logave"),
weights = NULL,
base = 10,
k = 0.5,
smoothing = 0.5,
force = FALSE
)
dfm_smooth(x, smoothing = 1)
x |
document-feature matrix created by dfm |
scheme |
a label of the weight type:
|
weights |
if |
base |
base for the logarithm when |
k |
the k for the augmentation when |
smoothing |
constant added to the dfm cells for smoothing, default is 1
for |
force |
logical; if |
dfm_weight
returns the dfm with weighted values. Note the
because the default weighting scheme is "count"
, simply calling this
function on an unweighted dfm will return the same object. Many users will
want the normalized dfm consisting of the proportions of the feature counts
within each document, which requires setting scheme = "prop"
.
dfm_smooth
returns a dfm whose values have been smoothed by
adding the smoothing
amount. Note that this effectively converts a
matrix from sparse to dense format, so may exceed memory requirements
depending on the size of your input matrix.
Manning, C.D., Raghavan, P., & Schütze, H. (2008). An Introduction to Information Retrieval. Cambridge: Cambridge University Press. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf
docfreq()
dfmat1 <- dfm(tokens(data_corpus_inaugural))
dfmat2 <- dfm_weight(dfmat1, scheme = "prop")
topfeatures(dfmat2)
dfmat3 <- dfm_weight(dfmat1)
topfeatures(dfmat3)
dfmat4 <- dfm_weight(dfmat1, scheme = "logcount")
topfeatures(dfmat4)
dfmat5 <- dfm_weight(dfmat1, scheme = "logave")
topfeatures(dfmat5)
# combine these methods for more complex dfm_weightings, e.g. as in Section 6.4
# of Introduction to Information Retrieval
head(dfm_tfidf(dfmat1, scheme_tf = "logcount"))
# smooth the dfm
dfmat <- dfm(tokens(data_corpus_inaugural))
dfm_smooth(dfmat, 0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.