All.steps_Dictionaries: Unprocessed Full dictionary codings
In gandalfnicolas/SADCAT: Dictionary creation with stereotype content dictionaries

All.steps_Dictionaries

R Documentation

Unprocessed Full dictionary codings

Description

Unprocessed Full dictionary codings, including the vector codings. May be used for different degrees of preprocessing. (e.g., use if text too extensive to preprocess)

Usage

All.steps_Dictionaries

Format

A data frame with 14449 rows and 1499 variables:

values: Words as obtained from the literature or Wordnet. No preprocessing
values0: lower-case word values
values1: lower-case word values, no spaces or symbols
values2: lower-case word values, no spaces or symbols, lemmatized
values3: lower-case word values, no spaces or symbols, lemmatized, with no ending Ss (not real words. These are the values averaged over in the final dictionaries)
_dict: variables ending in _dict indicate if the word is (1) or not (0) in the dictionary. If accompanied by a _lo it is coding if the word is low & in the dictionary, and if accompanied by a _hi it is coding if the word is high & in the dictionary (i.e., it combines the _dict and _dir variables)
_dir: variables ending in _dir indicate if the word is high (1), neutral (0) or low (-1) in the dictionary; e.g., friendly is high for sociability; unfriendly is low. Coded as NA if word not in the corresponding dictionary
fasttext: variables starting in fasttext are the word embedding dimensions for Fasttext trained on 2 million word vectors trained with subword information on Common Crawl (https://fasttext.cc/docs/en/english-vectors.html)
Glove: variables starting in Glove are the word embedding dimensions for Glove trained on Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors; https://nlp.stanford.edu/projects/glove/) (https://fasttext.cc/docs/en/english-vectors.html)
Word2vec: variables starting in Word2vec are the word embedding dimensions for Word2vec trained Google News (https://code.google.com/archive/p/word2vec/)
USE: variables starting in W2v are the word embedding dimensions for Universal Sentence Encoder trained on Common Crawl (https://arxiv.org/abs/1803.11175)