bt_input_make: format Crowdflower data for BT analysis
In kbenoit/sophistication: Functions to help measure textual sophistication

Description Usage Arguments Value Examples

Format Crowdflower results for analysis by the BradleyTerry2 package. Can accept covariates computed by covars_make().

bt_input_make(
  x = NULL,
  file = NULL,
  format = c("chameleons", "binomial"),
  remove_gold = TRUE,
  remove_screeners = remove_gold,
  covars = FALSE,
  covars_baseline = FALSE,
  covars_pos = FALSE,
  normalize = TRUE,
  ...
)

`x`	data.frame of results, if already loaded
`file`	character containing the file with the Crowdflower results (.csv format). One of `x` or `file` must be specified.
`format`	the format of the data: `"chameleons"` similar to `BradleyTerry2::chameleons()` a list of three data frames: `easier` and `harder`, each with a single column `ID` with a unique identifier for the snippet that won or lost, and of the same row dimensionality since each row corresponds to a single pairwise comparison; and `predictors`, a data.frame of predictors associated with each `ID` where the row.name corresponds to an ID in the in the `easier` and `harder` data.frames. `"binomial"` similar to extended example for `BradleyTerry2::baseball()` in `BradleyTerry2::BTm()`.
`remove_gold`	if `TRUE`, remove "gold" sentences from analysis
`remove_screeners`	if `TRUE`, remove "screener" sentences from analysis
`covars`	logical; if `TRUE` then add covariates for each snippet, taken directly from the Crowdflower saved data. Additional arguments to `covars_make()` can be passed through `...`
`covars_baseline`	logical; if `TRUE`, add summary baseline frequencies compared to Google and Brown corpora speech computed by `covars_make_baselines()`
`covars_pos`	logical; if `TRUE`, add frequencies of parts of speech computed by `covars_make_pos()`
`normalize`	if `TRUE` return appropriately normalized covariates, including parts of speech if applicable
`...`	additional arguments passed to `covars_make()`

a data.frame suitable for analysis by BTm

# compute abilities for the BT model from CF data
## Not run: 
require(BradleyTerry2)

## compute BT model without covariates
# in binomial format
inputdata1a <- bt_input_make(file = "data/CF_results/f921916.csv", format = "binomial")
BTmodel1a <- BTm(cbind(win1, win2), snippet1, snippet2, data = inputdata1a)
BTabils1a <- BTabilities(BTmodel1a)
head(BTabils1a)
# in "chameleons" format
inputdata1b <- bt_input_make(file = "data/CF_results/f921916.csv", format = "chameleons")
BTmodel1b <- BTm(player1 = easier, player2 = harder, id = "ID", data = inputdata1b)
BTabils1b <- BTabilities(BTmodel1b)
head(BTabils1b)

## compute BT model with covariates
inputdata2 <- bt_input_make(file = "data/CF_results/f921916.csv",
                            covars = TRUE, readability_measure = "Flesch")
BTmodel2 <- BTm(player1 = easier, player2 = harder,
                 formula = ~ W[ID] + St[ID] + C[ID] + Sy[ID] + Flesch[ID] + (1|ID),
                 id = "ID", data = inputdata2)
BTabils2 <- BTabilities(BTmodel2)
head(BTabils2[order(BTabils2[, 1], decreasing = TRUE), ], 10)

## compute BT model with covariates and POS
options(PYTHON_PATH = "/usr/local/bin")  # needed on Ken's system
inputdata3 <- bt_input_make(file = "data/CF_results/f921916.csv",
                            covars = TRUE, covars_pos = TRUE,
                            readability_measure = "Flesch")
BTmodel3 <- BTm(player1 = easier, player2 = harder,
                 formula = ~ W[ID] + St[ID] + C[ID] + Sy[ID] + Flesch[ID] +
                             ADJ[ID] + VERB[ID] + NOUN[ID] + (1|ID),
                 id = "ID", data = inputdata3)
BTabils3 <- BTabilities(BTmodel3)
head(BTabils3[order(BTabils3[, 1], decreasing = TRUE), ], 10)

## End(Not run)