weat: Speedy Word Embedding Association Test

View source: R/sweater.R

weatR Documentation

Speedy Word Embedding Association Test

Description

This functions test the bias in a set of word embeddings using the method by Caliskan et al (2017). If possible, please use query() instead.

Usage

weat(w, S_words, T_words, A_words, B_words, verbose = FALSE)

Arguments

w

a numeric matrix of word embeddings, e.g. from read_word2vec()

S_words

a character vector of the first set of target words. In an example of studying gender stereotype, it can include occupations such as programmer, engineer, scientists...

T_words

a character vector of the second set of target words. In an example of studying gender stereotype, it can include occupations such as nurse, teacher, librarian...

A_words

a character vector of the first set of attribute words. In an example of studying gender stereotype, it can include words such as man, male, he, his.

B_words

a character vector of the second set of attribute words. In an example of studying gender stereotype, it can include words such as woman, female, she, her.

verbose

logical, whether to display information

Value

A list with class "weat" containing the following components:

  • ⁠$S_diff⁠ for each of words in S_words, mean of the mean differences in cosine similarity between words in A_words and words in B_words

  • ⁠$T_diff⁠ for each of words in T_words, mean of the mean differences in cosine similarity between words in A_words and words in B_words

  • ⁠$S_words⁠ the input S_words

  • ⁠$T_words⁠ the input T_words

  • ⁠$A_words⁠ the input A_words

  • ⁠$B_words⁠ the input B_words weat_es() can be used to obtain the effect size of the test; weat_resampling() for a test of significance.

References

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1126/science.aal4230")}

Examples

# Reproduce the number in Caliskan et al. (2017) - Table 1, "Math vs. Arts"
data(glove_math)
S1 <- c("math", "algebra", "geometry", "calculus", "equations",
"computation", "numbers", "addition")
T1 <- c("poetry", "art", "dance", "literature", "novel", "symphony", "drama", "sculpture")
A1 <- c("male", "man", "boy", "brother", "he", "him", "his", "son")
B1 <- c("female", "woman", "girl", "sister", "she", "her", "hers", "daughter")
sw <- weat(glove_math, S1, T1, A1, B1)
weat_es(sw)

sweater documentation built on Nov. 7, 2023, 5:08 p.m.