semaxis: Characterise word semantics using the SemAxis framework
In chainsawriot/sweater: Speedy Word Embedding Association Test and Extras Using R

semaxis

R Documentation

Characterise word semantics using the SemAxis framework

Description

This function calculates the axis and the score using the SemAxis framework proposed in An et al (2018). If possible, please use query() instead.

Usage

semaxis(w, S_words, A_words, B_words, l = 0, verbose = FALSE)

Arguments

`w`	a numeric matrix of word embeddings, e.g. from `read_word2vec()`
`S_words`	a character vector of the first set of target words. In an example of studying gender stereotype, it can include occupations such as programmer, engineer, scientists...
`A_words`	a character vector of the first set of attribute words. In an example of studying gender stereotype, it can include words such as man, male, he, his.
`B_words`	a character vector of the second set of attribute words. In an example of studying gender stereotype, it can include words such as woman, female, she, her.
`l`	an integer indicates the number of words to augment each word in A and B based on cosine , see An et al (2018). Default to 0 (no augmentation).
`verbose`	logical, whether to display information

Value

A list with class "semaxis" containing the following components:

⁠$P⁠ for each of words in S, the score according to SemAxis
⁠$V⁠ the semantic axis vector
⁠$S_words⁠ the input S_words
⁠$A_words⁠ the input A_words
⁠$B_words⁠ the input B_words

References

An, J., Kwak, H., & Ahn, Y. Y. (2018). SemAxis: A lightweight framework to characterize domain-specific word semantics beyond sentiment. arXiv preprint arXiv:1806.05521.

Examples

data(glove_math)
S1 <- c("math", "algebra", "geometry", "calculus", "equations",
"computation", "numbers", "addition")
A1 <- c("male", "man", "boy", "brother", "he", "him", "his", "son")
B1 <- c("female", "woman", "girl", "sister", "she", "her", "hers", "daughter")
semaxis(glove_math, S1, A1, B1, l = 0)$P

chainsawriot/sweater documentation built on Feb. 2, 2025, 3:53 a.m.