| test_WEAT | R Documentation |
Tabulate data (cosine similarity and standardized effect size) and conduct the permutation test of significance for the Word Embedding Association Test (WEAT) and Single-Category Word Embedding Association Test (SC-WEAT).
For WEAT, two-samples permutation test is conducted (i.e., rearrangements of data).
For SC-WEAT, one-sample permutation test is conducted (i.e., rearrangements of +/- signs to data).
test_WEAT(
data,
T1,
T2,
A1,
A2,
use.pattern = FALSE,
labels = list(),
p.perm = TRUE,
p.nsim = 10000,
p.side = 2,
seed = NULL,
pooled.sd = "Caliskan"
)
data |
A |
T1, T2 |
Target words (a vector of words or a pattern of regular expression). If only |
A1, A2 |
Attribute words (a vector of words or a pattern of regular expression). Both must be specified. |
use.pattern |
Defaults to |
labels |
Labels for target and attribute concepts (a named |
p.perm |
Permutation test to get exact or approximate p value of the overall effect. Defaults to |
p.nsim |
Number of samples for resampling in permutation test. Defaults to If |
p.side |
One-sided ( In Caliskan et al.'s (2017) article, they reported one-sided p value for WEAT. Here, I suggest reporting two-sided p value as a more conservative estimate. The users take the full responsibility for the choice.
|
seed |
Random seed for reproducible results of permutation test. Defaults to |
pooled.sd |
Method used to calculate the pooled SD for effect size estimate in WEAT.
|
A list object of new class weat:
words.validValid (actually matched) words
words.not.foundWords not found
data.rawA data.table of cosine similarities between all word pairs
data.meanA data.table of mean cosine similarities across all attribute words
data.diffA data.table of differential mean cosine similarities between the two attribute concepts
eff.labelDescription for the difference between the two attribute concepts
eff.typeEffect type: WEAT or SC-WEAT
effRaw effect, standardized effect size, and p value (if p.perm=TRUE)
Download pre-trained word vectors data (.RData): https://psychbruce.github.io/WordVector_RData.pdf
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186.
tab_similarity
dict_expand()
dict_reliability()
test_RND()
## cc() is more convenient than c()!
weat = test_WEAT(
demodata,
labels=list(T1="King", T2="Queen", A1="Male", A2="Female"),
T1=cc("king, King"),
T2=cc("queen, Queen"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
seed=1)
weat
sc_weat = test_WEAT(
demodata,
labels=list(T1="Occupation", A1="Male", A2="Female"),
T1=cc("
architect, boss, leader, engineer, CEO, officer, manager,
lawyer, scientist, doctor, psychologist, investigator,
consultant, programmer, teacher, clerk, counselor,
salesperson, therapist, psychotherapist, nurse"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
seed=1)
sc_weat
## Not run:
## the same as the first example, but using regular expression
weat = test_WEAT(
demodata,
labels=list(T1="King", T2="Queen", A1="Male", A2="Female"),
use.pattern=TRUE, # use regular expression below
T1="^[kK]ing$",
T2="^[qQ]ueen$",
A1="^male$|^man$|^boy$|^brother$|^he$|^him$|^his$|^son$",
A2="^female$|^woman$|^girl$|^sister$|^she$|^her$|^hers$|^daughter$",
seed=1)
weat
## replicating Caliskan et al.'s (2017) results
## WEAT7 (Table 1): d = 1.06, p = .018
## (requiring installation of the `sweater` package)
Caliskan.WEAT7 = test_WEAT(
as_wordvec(sweater::glove_math),
labels=list(T1="Math", T2="Arts", A1="Male", A2="Female"),
T1=cc("math, algebra, geometry, calculus, equations, computation, numbers, addition"),
T2=cc("poetry, art, dance, literature, novel, symphony, drama, sculpture"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
p.side=1, seed=1234)
Caliskan.WEAT7
# d = 1.055, p = .0173 (= 173 counts / 10000 permutation samples)
## replicating Caliskan et al.'s (2017) supplemental results
## WEAT7 (Table S1): d = 0.97, p = .027
Caliskan.WEAT7.supp = test_WEAT(
demodata,
labels=list(T1="Math", T2="Arts", A1="Male", A2="Female"),
T1=cc("math, algebra, geometry, calculus, equations, computation, numbers, addition"),
T2=cc("poetry, art, dance, literature, novel, symphony, drama, sculpture"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
p.side=1, seed=1234)
Caliskan.WEAT7.supp
# d = 0.966, p = .0221 (= 221 counts / 10000 permutation samples)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.