weighting_functions: Weighting Functions

weighting_functionsR Documentation

Weighting Functions

Description

Local and global weighting functions.

Usage

lw_tf(x)

lw_raw(x)

lw_log(x)

lw_bin(x)

gw_idf(x)

gw_idf_alt(x)

gw_gfidf(x)

gw_nor(x)

gw_ent(x)

gw_bin(x)

gw_raw(x)

Arguments

x

A numeric matrix.

Details

There are many local and global weighting functions. In this package, local weighting functions are prefixed with lw_ and global weighting functions with gw_, so users can define their own weighting functions.

Local weighting functions (i.e. weighting every cell in the matrix):

lw_tf

Term frequency: f(x) = x.

lw_raw

Raw frequency, which is the same as the term frequency: f(x) = x.

lw_log

Logarithm: f(x) = log(x + 1).

lw_bin

Binary: f(x) = 1 if x > 0 and 0 otherwise.

Global weighting functions, weighting the columns of the matrix (hence, these weighting functions work according to expectation for a document-term matrix, i.e. with the documents as the rows and the terms as the columns):

gw_idf

Inverse document frequency: f(x) = log( nrow(x) / n + 1) where n = the number of rows in which the column >0.

gw_idf_alt

Alternative definition of the inverse document frequency: f(x) = log( nrow(x) / n) + 1 where n = the number of rows in which the column >0.

gw_gfidf

Global frequency multiplied by inverse document frequency: f(x) = colSums(x) / n where n = the number of rows in which the column >0.

gw_nor

Normal(ized) frequency: f(x) = x / colSums(x^2).

gw_ent

Entropy: f(x) = 1 + the relative Shannon entropy.

gw_bin

Binary: f(x) = 1.

gw_raw

Raw, which is the same as binary: f(x) = 1.

Value

A numeric matrix.

See Also

fast_lsa.

Examples

SndT_Fra <- read.table(system.file("extdata", "SndT_Fra.txt", package = "svs"),
   header = TRUE, sep = "\t", quote = "\"", encoding = "UTF-8",
   stringsAsFactors = FALSE)
tab_SndT_Fra <- table(SndT_Fra)
lw_log(tab_SndT_Fra)
gw_idf(tab_SndT_Fra)

svs documentation built on June 24, 2024, 5:07 p.m.