term_day_dist: Calculate statistics for term occurence across days

View source: R/dtm_functions.r

term_day_distR Documentation

Calculate statistics for term occurence across days

Description

Calculate statistics for term occurence across days

Usage

term_day_dist(dtm, meta = NULL, date.var = "date")

Arguments

dtm

A quanteda dfm. Alternatively, a DocumentTermMatrix from the tm package can be used, but then the meta parameter needs to be specified manually

meta

If dtm is a quanteda dfm, docvars(meta) is used by default (meta is NULL) to obtain the meta data. Otherwise, the meta data.frame has to be given by the user, with the rows of the meta data.frame matching the rows of the dtm (i.e. each row is a document)

date.var

The name of the meta column specifying the document date. default is "date". The values should be of type POSIXlt or POSIXct

Value

A data.frame with statistics for each term.

  • freq: The number of times a term occurred

  • doc.freq: The number of documents in which a term occured

  • days.n: The number of days on which a term occured

  • days.pct: The percentage of days on which a term occured

  • days.entropy: The entropy of the distribution of term frequency across days

  • days.entropy.norm: The normalized days.entropy, where 1 is a discrete uniform distribution

Examples

tdd = term_day_dist(rnewsflow_dfm, date.var='date')
head(tdd)
tail(tdd)

RNewsflow documentation built on May 31, 2023, 6:53 p.m.