mac: Mean average cosine similarity

View source: R/mac.R

macR Documentation

Mean average cosine similarity

Description

This function calculates the mean average cosine similarity (MAC) score proposed in Manzini et al (2019). If possible, please use query() instead.

Usage

mac(w, S_words, A_words, verbose = FALSE)

Arguments

w

a numeric matrix of word embeddings, e.g. from read_word2vec()

S_words

a character vector of the first set of target words. In an example of studying gender stereotype, it can include occupations such as programmer, engineer, scientists...

A_words

a character vector of the first set of attribute words. In an example of studying gender stereotype, it can include words such as man, male, he, his.

verbose

logical, whether to display information

Value

A list with class "mac" containing the following components:

  • ⁠$P⁠ a vector of cosine similarity values for every word in S_words

  • ⁠$S_words⁠ the input S_words

  • ⁠$A_words⁠ the input A_words mac_es() can be used to obtain the effect size of the test.

References

Manzini, T., Lim, Y. C., Tsvetkov, Y., & Black, A. W. (2019). Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. arXiv preprint arXiv:1904.04047.

Examples

data(googlenews)
S1 <- c("janitor", "statistician", "midwife", "bailiff", "auctioneer",
"photographer", "geologist", "shoemaker", "athlete", "cashier", "dancer",
"housekeeper", "accountant", "physicist", "gardener", "dentist", "weaver",
"blacksmith", "psychologist", "supervisor", "mathematician", "surveyor",
"tailor", "designer", "economist", "mechanic", "laborer", "postmaster",
"broker", "chemist", "librarian", "attendant", "clerical", "musician",
"porter", "scientist", "carpenter", "sailor", "instructor", "sheriff",
"pilot", "inspector", "mason", "baker", "administrator", "architect",
"collector", "operator", "surgeon", "driver", "painter", "conductor",
"nurse", "cook", "engineer", "retired", "sales", "lawyer", "clergy",
"physician", "farmer", "clerk", "manager", "guard", "artist", "smith",
"official", "police", "doctor", "professor", "student", "judge", "teacher",
"author", "secretary", "soldier")
A1 <- c("he", "son", "his", "him", "father", "man", "boy", "himself",
"male", "brother", "sons", "fathers", "men", "boys", "males", "brothers",
"uncle", "uncles", "nephew", "nephews")
x <- mac(googlenews, S1, A1)
x$P

sweater documentation built on Nov. 7, 2023, 5:08 p.m.