lexops: LexOPS' inbuilt variables

lexopsR Documentation

LexOPS' inbuilt variables

Description

A data frame containing 68 variables. When used in a generate pipeline, variables used from this dataframe can be easily cited using the cite_design function. The variables included in LexOPS are not intended to be exhaustive, but rather provide some useful and frequently used variables, and illustrative examples. The LexOPS functions will accept any dataframe of similar structure to LexOPS::lexops (one word/stimulus per row, with different features stored in other columns). Different datasets can be easily joined (such as with the dplyr join functions).

Usage

lexops

Format

A data frame with 262532 rows and 68 variables:

string

Strings (words/lemmas).

CMU.1letter

One-letter ARPABET representations of the main (North American) pronunciation according to the CMU Pronouncing Dictionary (http://www.speech.cs.cmu.edu/cgi-bin/cmudict).

CMU.PrN

Number of possible (North American) pronunciations according to the CMU Pronouncing Dictionary (http://www.speech.cs.cmu.edu/cgi-bin/cmudict).

eSpeak.br_1letter

One-letter representations of pronunciations according to the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).

eSpeak.br_IPA

International Phonetic Alphabet representations of pronunciations according to the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).

Zipf.SUBTLEX_UK

Zipf frequencies (log10(frequency_per_million)+3) calculated from UK subtitles (https://doi.org/10.1080/17470218.2013.850521).

Zipf.SUBTLEX_US

Zipf frequencies (log10(frequency_per_million)+3) calculated from US subtitles (https://doi.org/10.3758/BRM.41.4.977).

Zipf.BNC.Spoken

Zipf frequencies (log10(frequency_per_million)+3) calculated from the spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

Zipf.BNC.Written

Zipf frequencies (log10(frequency_per_million)+3) calculated from the written texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

Zipf.BNC.All

Zipf frequencies (log10(frequency_per_million)+3) calculated from both the written and spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

fpmw.SUBTLEX_UK

Frequencies per million words calculated from UK subtitles (https://doi.org/10.1080/17470218.2013.850521).

fpmw.SUBTLEX_US

Frequencies per million words calculated from US subtitles (https://doi.org/10.3758/BRM.41.4.977).

fpmw.BNC.Spoken

Frequencies per million words calculated from the spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

fpmw.BNC.Written

Frequencies per million words calculated from the written texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

fpmw.BNC.All

Frequencies per million words calculated from both the written and spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

PoS.SUBTLEX_UK

Dominant parts of speech according to SUBTLEX-UK (https://doi.org/10.1080/17470218.2013.850521).

PoS.BNC.Spoken

Dominant parts of speech according to an analysis of the spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

PoS.BNC.Written

Dominant parts of speech according to an analysis of the written texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

PoS.BNC.All

Dominant parts of speech according to an analysis of both the written and spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

PoS.ELP

Dominant parts of speech according to the English Lexicon Project http://doi.org/10.3758/BF03193014.

Length

Number of characters in the string.

BG.SUBTLEX_UK

Mean character bigram probabilities for each string, calculated using frequency data from UK subtitles (https://doi.org/10.1080/17470218.2013.850521).

BG.SUBTLEX_US

Mean character bigram probabilities for each string, calculated using frequency data from US subtitles (https://doi.org/10.3758/BRM.41.4.977).

BG.BNC.Spoken

Mean character bigram probabilities for each string, calculated using frequency data calculated for the spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

BG.BNC.Written

Mean character bigram probabilities for each string, calculated using frequency data calculated for the written texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

BG.BNC.All

Mean character bigram probabilities for each string, calculated using frequency data calculated for both the written and spoken texts of the British National Corpus (http://www.natcorp.ox.ac.uk/).

ON.OLD20

Orthographic Neighbourhood size, indexed by orthographic levenshtein distance 20, calculated using all words in the LexOPS database.

ON.Colthearts_N

Orthographic Neighbourhood size, indexed by Coltheart's N, calculated using all words in the LexOPS database.

ON.Log_OLD20

The log of ON.OLD20.

ON.Log_Colthearts_N

The log of ON.Colthearts_N.

Syllables.CMU

Number of syllables of the CMU Pronouncing Dictionary's (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) main pronunciation.

Syllables.eSpeak.br

Number of syllables, according to the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).

Phonemes.CMU

Number of phonemes of the CMU Pronouncing Dictionary's (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) main pronunciation.

Phonemes.eSpeak.br

Number of phonemes, according to the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).

Rhyme.CMU

Rhyme sound of the CMU Pronouncing Dictionary's (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) main pronunciation.

Rhyme.eSpeak.br

Rhyme sound of the standard British pronunciation calculated by the eSpeak speech synthesiser (http://espeak.sourceforge.net/).

PN.PLD20.CMU

Phonological Neighbourhood size, indexed by phonological levenshtein distance 20, calculated using all words with a CMU pronunciation.

PN.PLD20.eSpeak.br

Phonological Neighbourhood size, indexed by phonological levenshtein distance 20, calculated using all words with an eSpeak standard British English pronunciation.

PN.Log_PLD20.CMU

The log of PN.PLD20.CMU.

PN.Log_PLD20.eSpeak.br

The log of PN.PLD20.eSpeak.br.

PN.Colthearts_N.CMU

Phonological Neighbourhood size, indexed by Coltheart's N, calculated using all words with a CMU pronunciation.

PN.Colthearts_N.eSpeak.br

Phonological Neighbourhood size, indexed by Coltheart's N, calculated using all words with an eSpeak standard British English pronunciation.

PN.Log_Colthearts_N.CMU

The log of PN.Colthearts_N.CMU.

PN.Log_Colthearts_N.eSpeak.br

The log of PN.Colthearts_N.eSpeak.br.

FAM.Glasgow_Norms

Familiarity ratings (1-7; low-high) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).

FAM.Clark_and_Paivio

Familiarity ratings (1-7; low-high) from the Clark and Paivio (2004) norms (http://doi.org/10.3758/BF03195584).

AoA.Kuperman

Age of Acquisition ratings (1-25; early-late) from the Kuperman et al. (2012) norms (http://doi.org/10.3758/s13428-012-0210-4).

AoA.Glasgow_Norms

Age of Acquisition ratings (1-7; early-late) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).

AoA.BrysbaertBiemiller

Test-based Age of Acquisition (2-14; early-late) from Brysbaert and Biemiller (2017) (http://doi.org/10.3758/s13428-016-0811-4).

CNC.Brysbaert

Concreteness ratings (1-5; low-high) from the Brysbaert et al. (2014) norms (http://doi.org/10.3758/s13428-013-0403-5).

CNC.Glasgow_Norms

Concreteness ratings (1-7; low-high) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).

IMAG.Glasgow_Norms

Imageability ratings (1-7; low-high) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).

IMAG.Clark_and_Paivio

Imageability ratings (1-7; low-high) from the Clark and Paivio (2004) norms (http://doi.org/10.3758/BF03195584).

AROU.Warriner

Arousal ratings (1-9; less-more) from the Warriner et al. (2013) norms (http://doi.org/10.3758/s13428-012-0314-x).

AROU.Glasgow_Norms

Arousal ratings (1-9; less-more) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).

VAL.Warriner

Valence ratings (1-9; more negative-more positive) from the Warriner et al. (2013) norms (http://doi.org/10.3758/s13428-012-0314-x).

VAL.Glasgow_Norms

Valence ratings (1-9; more negative-more positive) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).

DOM.Warriner

Dominance ratings (1-9; less-more) from the Warriner et al. (2013) norms (http://doi.org/10.3758/s13428-012-0314-x).

DOM.Glasgow_Norms

Dominance ratings (1-9; less-more) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).

SIZE.Glasgow_Norms

Size ratings (1-7; smaller-larger) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).

GEND.Glasgow_Norms

Gender ratings (1-7; more female-more male) from the Glasgow Norms (http://doi.org/10.3758/s13428-018-1099-3).

HUM.EngelthalerHills

Humour ratings (1-5; less funny-more funny) from the Engelthaler and Hills (2018) norms http://doi.org/10.3758/s13428-017-0930-6.

PREV.Brysbaert

Word prevalence scores from Brysbaert et al. (2019) (http://doi.org/10.3758/s13428-018-1077-9).

PK.Brysbaert

Proportion of people who know the word, from Brysbaert et al. (2019) (http://doi.org/10.3758/s13428-018-1077-9).

RT.BLP

Lexical Decision Response Time according to the British Lexicon Project (http://doi.org/10.3758/s13428-011-0118-4).

Accuracy.BLP

Lexical Decision Accuracy according to the British Lexicon Project (http://doi.org/10.3758/s13428-011-0118-4).

RT.ELP

Lexical Decision Response Time according to the English Lexicon Project http://doi.org/10.3758/BF03193014.

Accuracy.ELP

Lexical Decision Accuracy according to the English Lexicon Project http://doi.org/10.3758/BF03193014.


JackEdTaylor/LexOPS documentation built on Sept. 10, 2023, 3:09 a.m.