bt_input_make: format Crowdflower data for BT analysis

Description Usage Arguments Value Examples

View source: R/BradleyTerry_functions.R

Description

Format Crowdflower results for analysis by the BradleyTerry2 package. Can accept covariates computed by covars_make().

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
bt_input_make(
  x = NULL,
  file = NULL,
  format = c("chameleons", "binomial"),
  remove_gold = TRUE,
  remove_screeners = remove_gold,
  covars = FALSE,
  covars_baseline = FALSE,
  covars_pos = FALSE,
  normalize = TRUE,
  ...
)

Arguments

x

data.frame of results, if already loaded

file

character containing the file with the Crowdflower results (.csv format). One of x or file must be specified.

format

the format of the data:

"chameleons"

similar to BradleyTerry2::chameleons() a list of three data frames: easier and harder, each with a single column ID with a unique identifier for the snippet that won or lost, and of the same row dimensionality since each row corresponds to a single pairwise comparison; and predictors, a data.frame of predictors associated with each ID where the row.name corresponds to an ID in the in the easier and harder data.frames.

"binomial"

similar to extended example for BradleyTerry2::baseball() in BradleyTerry2::BTm().

remove_gold

if TRUE, remove "gold" sentences from analysis

remove_screeners

if TRUE, remove "screener" sentences from analysis

covars

logical; if TRUE then add covariates for each snippet, taken directly from the Crowdflower saved data. Additional arguments to covars_make() can be passed through ...

covars_baseline

logical; if TRUE, add summary baseline frequencies compared to Google and Brown corpora speech computed by covars_make_baselines()

covars_pos

logical; if TRUE, add frequencies of parts of speech computed by covars_make_pos()

normalize

if TRUE return appropriately normalized covariates, including parts of speech if applicable

...

additional arguments passed to covars_make()

Value

a data.frame suitable for analysis by BTm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# compute abilities for the BT model from CF data
## Not run: 
require(BradleyTerry2)

## compute BT model without covariates
# in binomial format
inputdata1a <- bt_input_make(file = "data/CF_results/f921916.csv", format = "binomial")
BTmodel1a <- BTm(cbind(win1, win2), snippet1, snippet2, data = inputdata1a)
BTabils1a <- BTabilities(BTmodel1a)
head(BTabils1a)
# in "chameleons" format
inputdata1b <- bt_input_make(file = "data/CF_results/f921916.csv", format = "chameleons")
BTmodel1b <- BTm(player1 = easier, player2 = harder, id = "ID", data = inputdata1b)
BTabils1b <- BTabilities(BTmodel1b)
head(BTabils1b)

## compute BT model with covariates
inputdata2 <- bt_input_make(file = "data/CF_results/f921916.csv",
                            covars = TRUE, readability_measure = "Flesch")
BTmodel2 <- BTm(player1 = easier, player2 = harder,
                 formula = ~ W[ID] + St[ID] + C[ID] + Sy[ID] + Flesch[ID] + (1|ID),
                 id = "ID", data = inputdata2)
BTabils2 <- BTabilities(BTmodel2)
head(BTabils2[order(BTabils2[, 1], decreasing = TRUE), ], 10)

## compute BT model with covariates and POS
options(PYTHON_PATH = "/usr/local/bin")  # needed on Ken's system
inputdata3 <- bt_input_make(file = "data/CF_results/f921916.csv",
                            covars = TRUE, covars_pos = TRUE,
                            readability_measure = "Flesch")
BTmodel3 <- BTm(player1 = easier, player2 = harder,
                 formula = ~ W[ID] + St[ID] + C[ID] + Sy[ID] + Flesch[ID] +
                             ADJ[ID] + VERB[ID] + NOUN[ID] + (1|ID),
                 id = "ID", data = inputdata3)
BTabils3 <- BTabilities(BTmodel3)
head(BTabils3[order(BTabils3[, 1], decreasing = TRUE), ], 10)

## End(Not run)

kbenoit/sophistication documentation built on May 12, 2021, 5:57 a.m.