term_innovation: Experimental: Convert dtm scores to a term innovation score,...

View source: R/window_comparison.r

term_innovationR Documentation

Experimental: Convert dtm scores to a term innovation score, based on changes in term use over time

Description

For each term in m, the usage before and after the document date is compared (with a chi2 test) to see whether usage increased.

Usage

term_innovation(
  m,
  date,
  m2 = NULL,
  date2 = NULL,
  lwindow = -7,
  rwindow = 7,
  date_unit = c("days", "hours", "minutes", "seconds"),
  min_chi = 5.024,
  min_ratio = 2,
  smooth = 1
)

Arguments

m

A CsparseMatrix

date

a character vector that specifies a date for each row in m. If given, only pairs of rows within a given date range (see lwindow, rwindow and date_unit) are calculated.

m2

Optionally, use a different matrix for calculating the innovation scores. For example, if m is a DTM of press releases, m2 can be a DTM of news articles, to see if term usage increased in the news after the press release.

date2

If m2 is used, date2 has to be used to specify the date for the rows in m2 (otherwise date will be ignored)

lwindow

If date (and date2) are used, lwindow determines the left side of the date window. e.g. -10 means that rows are only matched with rows for which date is within 10 [date_units] before.

rwindow

Like lwindow, but for the right side. e.g. an lwindow of -1 and rwindow of 1, with date_unit is "days", means that only rows are matched for which the dates are within a 1 day distance

date_unit

The date unit used in lwindow and rwindow. Supports "days", "hours", "minutes" and "seconds". Note that refers to the time distance between two rows ("days" doesn't refer to calendar days, but to a time of 24 hours)

min_chi

The minimum chi-square value

min_ratio

The minimum ratio (rwindow score / lwindow score)

smooth

The smoothing factor (prevents -Inf/Inf ratio)

Value

A CsparseMatrix


RNewsflow documentation built on May 31, 2023, 6:53 p.m.