covars_make_baselines: compute mean baseline frequencies from Google ngrams

Description Usage Arguments Value See Also Examples

View source: R/covars_make_baselines.R

Description

Computes the mean frequencies of terms in a text based on closest match to the Google unigram corpus, for the decade in which the text was recorded.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
covars_make_baselines(x, ...)

## S3 method for class 'snippet'
covars_make_baselines(x, ...)

## S3 method for class 'data.frame'
covars_make_baselines(x, text_field = "text", ...)

## S3 method for class 'corpus'
covars_make_baselines(x, ...)

## S3 method for class 'character'
covars_make_baselines(
  x,
  baseline_data = c("brown", "google"),
  baseline_year = 2000,
  baseline_word = "the",
  ...
)

Arguments

x

data.frame of results, if already loaded

...

additional arguments passed through to other functions

text_field

the name of the text field, if a data.frame, default is "text"

baseline_data

"brown", "google", or both (the default) to indicate the Brown corpus data or Google n-grams data, respectively.

baseline_year

a scalar or vector of the baseline years to choose for reference: a year ending in 0 from 1780-2000, or NULL to match a text to its nearest year (the year information is taken from the textid that is part of the Crowdflower data). Does not apply if baseline_data = "brown".

baseline_word

the word against which other word frequencies will be baselined. This defaults to "the" but can be any word found in the word frequency tables

Value

a data.frame suitable for adding to variates for analysis by BTm

See Also

data_matrix_google1grams(), data_integer_brownfreq()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
txt <- c(d1 = quanteda::data_char_sampletext,
         d2 = "No if and or but.",
         d3 = "the")
covars_make_baselines(txt)

txt2 <- rep("The art of husbandry is ancient.", 3)
names(txt2) <- paste0("doc", 1:3)
covars_make_baselines(txt2, baseline_data = "google")
covars_make_baselines(txt2, baseline_data = "google",
                      baseline_year = c(1790, 1850, 1980))

## Not run: 
head(covars_make_baselines(file = "data/CF_results/f921916.csv"))
head(bt_input_make(file = "data/CF_results/f921916.csv",
                   covars = TRUE, readability_measure = "Flesch")$predictors)

## End(Not run)

kbenoit/sophistication documentation built on May 12, 2021, 5:57 a.m.