nas: Calculate Normalized Association Score

View source: R/nas.R

nasR Documentation

Calculate Normalized Association Score

Description

This functions quantifies the bias in a set of word embeddings by Caliskan et al (2017). In comparison to WEAT introduced in the same paper, this method is more suitable for continuous ground truth data. See Figure 1 and Figure 2 of the original paper. If possible, please use query() instead.

Usage

nas(w, S_words, A_words, B_words, verbose = FALSE)

Arguments

w

a numeric matrix of word embeddings, e.g. from read_word2vec()

S_words

a character vector of the first set of target words. In an example of studying gender stereotype, it can include occupations such as programmer, engineer, scientists...

A_words

a character vector of the first set of attribute words. In an example of studying gender stereotype, it can include words such as man, male, he, his.

B_words

a character vector of the second set of attribute words. In an example of studying gender stereotype, it can include words such as woman, female, she, her.

verbose

logical, whether to display information

Value

A list with class "nas" containing the following components:

  • ⁠$P⁠ a vector of normalized association score for every word in S

  • ⁠$raw⁠ a list of raw results used for calculating normalized association scores

  • ⁠$S_words⁠ the input S_words

  • ⁠$A_words⁠ the input A_words

  • ⁠$B_words⁠ the input B_words

References

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1126/science.aal4230")}


sweater documentation built on Nov. 7, 2023, 5:08 p.m.