Weighting Functions

Share:

Description

Local and global weighting functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21

Arguments

x

A numeric matrix.

Details

There are many local and global weighting functions. In this package, local weighting functions are prefixed with lw_ and global weighting functions with gw_, so users can define their own weighting functions.

Local weighting functions (i.e. weighting every cell in the matrix):

  • lw_tf Term frequency: f(x) = x.

  • lw_raw Raw frequency, which is the same as the term frequency: f(x) = x.

  • lw_log Logarithm: f(x) = log(x + 1).

  • lw_bin Binary: f(x) = 1 if x > 0 and 0 otherwise.

Global weighting functions, weighting the columns of the matrix (hence, these weighting functions work according to expectation for a document-term matrix, i.e. with the documents as the rows and the terms as the columns):

  • gw_idf Inverse document frequency: f(x) = log( nrow(x) / n + 1) where n = the number of rows in which the column >0.

  • gw_idf_alt Alternative definition of the inverse document frequency: f(x) = log( nrow(x) / n) + 1 where n = the number of rows in which the column >0.

  • gw_gfidf Global frequency multiplied by inverse document frequency: f(x) = colSums(x) / n where n = the number of rows in which the column >0.

  • gw_nor Normal(ized) frequency: f(x) = x / colSums(x^2).

  • gw_ent Entropy: f(x) = 1 + the relative Shannon entropy.

  • gw_bin Binary: f(x) = 1.

  • gw_raw Raw, which is the same as binary: f(x) = 1.

Value

A numeric matrix.

See Also

fast_lsa.

Examples

1
2
3
4
5
SndT_Fra <- read.table(system.file("extdata", "SndT_Fra.txt", package = "svs"),
   header = TRUE, sep = "\t", quote = "\"", encoding = "UTF-8")
tab.SndT_Fra <- table(SndT_Fra)
lw_log(tab.SndT_Fra)
gw_idf(tab.SndT_Fra)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.