calculateTfidf: Calculate a term frequecy / inverse document frequency for...

View source: R/tidyTFIDF.R

calculateTfidfR Documentation

Calculate a term frequecy / inverse document frequency for concepts associated with a grouping

Description

The grouped dataframe here acts as a "Document" from the perpective of the TFIDF calculation but might be a person TODO: fix methods of calculating tfidf

Usage

calculateTfidf(
  groupedDf,
  sampleVars,
  countVar = NULL,
  idfDf = NULL,
  k1 = 1.2,
  b = 0.95
)

Arguments

groupedDf

a dataframe whose grouping defines the "term"

sampleVars

the column(s) that contains the unique id of a sample, i.e. traditionally a "document" but could be a patient. escaped by vars(...). This can include an outcome variable.

countVar

a field that contains a count. If this is given then it is assumed that the concept & document combinations are unique

idfDf

an optional data frame containing idf information from this or another corpus

k1

default 1.2 - okapi BM25 parameter

b

default 0.95 - okapi BM25

Value

a data frame with tfidf stats for each concept in each group (i.e. document)


terminological/tidy-info-stats documentation built on Nov. 19, 2022, 11:23 p.m.