frequencies: Lexical frequencies of all available words in SUBTLEX

Description Usage Format Source References Examples

Description

A dataset containing the absolute, relative, and Zipf-transformed frequencies of 456,546 words in Catalan, English, and Spanish.

Usage

1
data("frequencies")

Format

A data frame with 456,546 rows and 5 variables:

word

Orthographic word form

language

Language the word form belongs to

frequency_abs

Abolute frequency (raw counts in the corpus)

frequency_rel

Relative frequency (counts per million words in the corpus)

frequency_zipf

Zipf-transformed frequency (log10(frequency_rel+3))

Source

Catalan: https://link.springer.com/article/10.3758%2Fs13428-019-01233-1, English: https://journals.sagepub.com/doi/full/10.1080/17470218.2013.850521, Spanish: https://psycnet.apa.org/record/2011-19447-001

References

English

Van Heuven, W. J., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. Quarterly journal of experimental psychology, 67(6), 1176-1190.

Spanish

Cuetos, F., Glez-Nosti, M., Barbon, A., & Brysbaert, M. (2011). SUBTLEX-ESP: frecuencias de las palabras espanolas basadas en los subtitulos de las peliculas. Psicológica, 32(2), 133-144.

Catalan

Boada, R., Guasch, M., Haro, J., Demestre, J., & Ferré, P. (2020). SUBTLEX-CAT: Subtitle word frequencies and contextual diversity for Catalan. Behavior research methods, 52(1), 360-375.

Examples

1
data("frequencies")

bilingual-project/jtracer documentation built on Dec. 19, 2021, 9:42 a.m.