All.steps_Dictionaries: Unprocessed Full dictionary codings

All.steps_DictionariesR Documentation

Unprocessed Full dictionary codings

Description

Unprocessed Full dictionary codings, including the vector codings. May be used for different degrees of preprocessing. (e.g., use if text too extensive to preprocess)

Usage

All.steps_Dictionaries

Format

A data frame with 14449 rows and 1499 variables:

values

Words as obtained from the literature or Wordnet. No preprocessing

values0

lower-case word values

values1

lower-case word values, no spaces or symbols

values2

lower-case word values, no spaces or symbols, lemmatized

values3

lower-case word values, no spaces or symbols, lemmatized, with no ending Ss (not real words. These are the values averaged over in the final dictionaries)

_dict

variables ending in _dict indicate if the word is (1) or not (0) in the dictionary. If accompanied by a _lo it is coding if the word is low & in the dictionary, and if accompanied by a _hi it is coding if the word is high & in the dictionary (i.e., it combines the _dict and _dir variables)

_dir

variables ending in _dir indicate if the word is high (1), neutral (0) or low (-1) in the dictionary; e.g., friendly is high for sociability; unfriendly is low. Coded as NA if word not in the corresponding dictionary

fasttext

variables starting in fasttext are the word embedding dimensions for Fasttext trained on 2 million word vectors trained with subword information on Common Crawl (https://fasttext.cc/docs/en/english-vectors.html)

Glove

variables starting in Glove are the word embedding dimensions for Glove trained on Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors; https://nlp.stanford.edu/projects/glove/) (https://fasttext.cc/docs/en/english-vectors.html)

Word2vec

variables starting in Word2vec are the word embedding dimensions for Word2vec trained Google News (https://code.google.com/archive/p/word2vec/)

USE

variables starting in W2v are the word embedding dimensions for Universal Sentence Encoder trained on Common Crawl (https://arxiv.org/abs/1803.11175)

...


gandalfnicolas/SADCAT documentation built on June 8, 2024, 6:26 a.m.