covars_make_baselines: compute mean baseline frequencies from Google ngrams
In kbenoit/sophistication: Functions to help measure textual sophistication

Description Usage Arguments Value See Also Examples

Computes the mean frequencies of terms in a text based on closest match to the Google unigram corpus, for the decade in which the text was recorded.

covars_make_baselines(x, ...)

## S3 method for class 'snippet'
covars_make_baselines(x, ...)

## S3 method for class 'data.frame'
covars_make_baselines(x, text_field = "text", ...)

## S3 method for class 'corpus'
covars_make_baselines(x, ...)

## S3 method for class 'character'
covars_make_baselines(
  x,
  baseline_data = c("brown", "google"),
  baseline_year = 2000,
  baseline_word = "the",
  ...
)

`x`	data.frame of results, if already loaded
`...`	additional arguments passed through to other functions
`text_field`	the name of the text field, if a data.frame, default is `"text"`
`baseline_data`	`"brown"`, `"google"`, or both (the default) to indicate the Brown corpus data or Google n-grams data, respectively.
`baseline_year`	a scalar or vector of the baseline years to choose for reference: a year ending in 0 from 1780-2000, or `NULL` to match a text to its nearest year (the year information is taken from the `textid` that is part of the Crowdflower data). Does not apply if `baseline_data = "brown"`.
`baseline_word`	the word against which other word frequencies will be baselined. This defaults to "the" but can be any word found in the word frequency tables

a data.frame suitable for adding to variates for analysis by BTm

data_matrix_google1grams(), data_integer_brownfreq()

txt <- c(d1 = quanteda::data_char_sampletext,
         d2 = "No if and or but.",
         d3 = "the")
covars_make_baselines(txt)

txt2 <- rep("The art of husbandry is ancient.", 3)
names(txt2) <- paste0("doc", 1:3)
covars_make_baselines(txt2, baseline_data = "google")
covars_make_baselines(txt2, baseline_data = "google",
                      baseline_year = c(1790, 1850, 1980))

## Not run: 
head(covars_make_baselines(file = "data/CF_results/f921916.csv"))
head(bt_input_make(file = "data/CF_results/f921916.csv",
                   covars = TRUE, readability_measure = "Flesch")$predictors)

## End(Not run)