lexops: LexOPS' inbuilt variables
In JackEdTaylor/LexOPS: A Package and Shiny App for Generating Matched Stimuli

lexops

R Documentation

LexOPS' inbuilt variables

Description

A data frame containing 68 variables. When used in a generate pipeline, variables used from this dataframe can be easily cited using the cite_design function. The variables included in LexOPS are not intended to be exhaustive, but rather provide some useful and frequently used variables, and illustrative examples. The LexOPS functions will accept any dataframe of similar structure to LexOPS::lexops (one word/stimulus per row, with different features stored in other columns). Different datasets can be easily joined (such as with the dplyr join functions).

Usage

lexops

Format

A data frame with 262532 rows and 68 variables:

string: Strings (words/lemmas).
CMU.1letter: One-letter ARPABET representations of the main (North American) pronunciation according to the CMU Pronouncing Dictionary (http://www.speech.cs.cmu.edu/cgi-bin/cmudict).
CMU.PrN: Number of possible (North American) pronunciations according to the CMU Pronouncing Dictionary (http://www.speech.cs.cmu.edu/cgi-bin/cmudict).
eSpeak.br_1letter: One-letter representations of pronunciations according to the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).
eSpeak.br_IPA: International Phonetic Alphabet representations of pronunciations according to the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).
Zipf.SUBTLEX_UK: Zipf frequencies (log10(frequency_per_million)+3) calculated from UK subtitles (https://doi.org/10.1080/17470218.2013.850521).
Zipf.SUBTLEX_US: Zipf frequencies (log10(frequency_per_million)+3) calculated from US subtitles (https://doi.org/10.3758/BRM.41.4.977).
Zipf.BNC.Spoken: Zipf frequencies (log10(frequency_per_million)+3) calculated from the spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
Zipf.BNC.Written: Zipf frequencies (log10(frequency_per_million)+3) calculated from the written texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
Zipf.BNC.All: Zipf frequencies (log10(frequency_per_million)+3) calculated from both the written and spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
fpmw.SUBTLEX_UK: Frequencies per million words calculated from UK subtitles (https://doi.org/10.1080/17470218.2013.850521).
fpmw.SUBTLEX_US: Frequencies per million words calculated from US subtitles (https://doi.org/10.3758/BRM.41.4.977).
fpmw.BNC.Spoken: Frequencies per million words calculated from the spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
fpmw.BNC.Written: Frequencies per million words calculated from the written texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
fpmw.BNC.All: Frequencies per million words calculated from both the written and spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
PoS.SUBTLEX_UK: Dominant parts of speech according to SUBTLEX-UK (https://doi.org/10.1080/17470218.2013.850521).
PoS.BNC.Spoken: Dominant parts of speech according to an analysis of the spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
PoS.BNC.Written: Dominant parts of speech according to an analysis of the written texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
PoS.BNC.All: Dominant parts of speech according to an analysis of both the written and spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
PoS.ELP: Dominant parts of speech according to the English Lexicon Project http://doi.org/10.3758/BF03193014.
Length: Number of characters in the string.
BG.SUBTLEX_UK: Mean character bigram probabilities for each string, calculated using frequency data from UK subtitles (https://doi.org/10.1080/17470218.2013.850521).
BG.SUBTLEX_US: Mean character bigram probabilities for each string, calculated using frequency data from US subtitles (https://doi.org/10.3758/BRM.41.4.977).
BG.BNC.Spoken: Mean character bigram probabilities for each string, calculated using frequency data calculated for the spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
BG.BNC.Written: Mean character bigram probabilities for each string, calculated using frequency data calculated for the written texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
BG.BNC.All: Mean character bigram probabilities for each string, calculated using frequency data calculated for both the written and spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).
ON.OLD20: Orthographic Neighbourhood size, indexed by orthographic levenshtein distance 20, calculated using all words in the LexOPS database.
ON.Colthearts_N: Orthographic Neighbourhood size, indexed by Coltheart's N, calculated using all words in the LexOPS database.
ON.Log_OLD20: The log of ON.OLD20.
ON.Log_Colthearts_N: The log of ON.Colthearts_N.
Syllables.CMU: Number of syllables of the CMU Pronouncing Dictionary's (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) main pronunciation.
Syllables.eSpeak.br: Number of syllables, according to the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).
Phonemes.CMU: Number of phonemes of the CMU Pronouncing Dictionary's (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) main pronunciation.
Phonemes.eSpeak.br: Number of phonemes, according to the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).
Rhyme.CMU: Rhyme sound of the CMU Pronouncing Dictionary's (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) main pronunciation.
Rhyme.eSpeak.br: Rhyme sound of the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).
PN.PLD20.CMU: Phonological Neighbourhood size, indexed by phonological levenshtein distance 20, calculated using all words with a CMU pronunciation.
PN.PLD20.eSpeak.br: Phonological Neighbourhood size, indexed by phonological levenshtein distance 20, calculated using all words with an eSpeak standard British English pronunciation.
PN.Log_PLD20.CMU: The log of PN.PLD20.CMU.
PN.Log_PLD20.eSpeak.br: The log of PN.PLD20.eSpeak.br.
PN.Colthearts_N.CMU: Phonological Neighbourhood size, indexed by Coltheart's N, calculated using all words with a CMU pronunciation.
PN.Colthearts_N.eSpeak.br: Phonological Neighbourhood size, indexed by Coltheart's N, calculated using all words with an eSpeak standard British English pronunciation.
PN.Log_Colthearts_N.CMU: The log of PN.Colthearts_N.CMU.
PN.Log_Colthearts_N.eSpeak.br: The log of PN.Colthearts_N.eSpeak.br.
FAM.Glasgow_Norms: Familiarity ratings (1-7; low-high) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).
FAM.Clark_and_Paivio: Familiarity ratings (1-7; low-high) from the Clark and Paivio (2004) norms (http://doi.org/10.3758/BF03195584).
AoA.Kuperman: Age of Acquisition ratings (1-25; early-late) from the Kuperman et al. (2012) norms (http://doi.org/10.3758/s13428-012-0210-4).
AoA.Glasgow_Norms: Age of Acquisition ratings (1-7; early-late) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).
AoA.BrysbaertBiemiller: Test-based Age of Acquisition (2-14; early-late) from Brysbaert and Biemiller (2017) (http://doi.org/10.3758/s13428-016-0811-4).
CNC.Brysbaert: Concreteness ratings (1-5; low-high) from the Brysbaert et al. (2014) norms (http://doi.org/10.3758/s13428-013-0403-5).
CNC.Glasgow_Norms: Concreteness ratings (1-7; low-high) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).
IMAG.Glasgow_Norms: Imageability ratings (1-7; low-high) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).
IMAG.Clark_and_Paivio: Imageability ratings (1-7; low-high) from the Clark and Paivio (2004) norms (http://doi.org/10.3758/BF03195584).
AROU.Warriner: Arousal ratings (1-9; less-more) from the Warriner et al. (2013) norms (http://doi.org/10.3758/s13428-012-0314-x).
AROU.Glasgow_Norms: Arousal ratings (1-9; less-more) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).
VAL.Warriner: Valence ratings (1-9; more negative-more positive) from the Warriner et al. (2013) norms (http://doi.org/10.3758/s13428-012-0314-x).
VAL.Glasgow_Norms: Valence ratings (1-9; more negative-more positive) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).
DOM.Warriner: Dominance ratings (1-9; less-more) from the Warriner et al. (2013) norms (http://doi.org/10.3758/s13428-012-0314-x).
DOM.Glasgow_Norms: Dominance ratings (1-9; less-more) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).
SIZE.Glasgow_Norms: Size ratings (1-7; smaller-larger) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).
GEND.Glasgow_Norms: Gender ratings (1-7; more female-more male) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).
HUM.EngelthalerHills: Humour ratings (1-5; less funny-more funny) from the Engelthaler and Hills (2018) norms http://doi.org/10.3758/s13428-017-0930-6.
PREV.Brysbaert: Word prevalence scores from Brysbaert et al. (2019) (http://doi.org/10.3758/s13428-018-1077-9).
PK.Brysbaert: Proportion of people who know the word, from Brysbaert et al. (2019) (http://doi.org/10.3758/s13428-018-1077-9).
RT.BLP: Lexical Decision Response Time according to the British Lexicon Project (http://doi.org/10.3758/s13428-011-0118-4).
Accuracy.BLP: Lexical Decision Accuracy according to the British Lexicon Project (http://doi.org/10.3758/s13428-011-0118-4).
RT.ELP: Lexical Decision Response Time according to the English Lexicon Project http://doi.org/10.3758/BF03193014.
Accuracy.ELP: Lexical Decision Accuracy according to the English Lexicon Project http://doi.org/10.3758/BF03193014.