Spanishdicts: Full spanish dictionaries
In gandalfnicolas/SADCAT: Dictionary creation with stereotype content dictionaries

Spanishdicts

R Documentation

Full spanish dictionaries

Description

Full spanish dictionaries. Word embedding values are based on english data.

Usage

Spanishdicts

Format

A data frame:

Palabra: Spanish word, not stemmed but some preprocessing (e.g., no symbols, spaces, accents)
Palabra_stem: Stemmed version of Palabra
values: Words as obtained from the literature or Wordnet. No preprocessing
values0: lower-case word values
values1: lower-case word values, no spaces or symbols
values2: lower-case word values, no spaces or symbols, lemmatized
values3: lower-case word values, no spaces or symbols, lemmatized, with no ending Ss (not real words. These are the values averaged over in the final dictionaries)
_dict: variables ending in _dict indicate if the word is (1) or not (0) in the dictionary. If accompanied by a _lo it is coding if the word is low & in the dictionary, and if accompanied by a _hi it is coding if the word is high & in the dictionary (i.e., it combines the _dict and _dir variables)
_dir: variables ending in _dir indicate if the word is high (1), neutral (0) or low (-1) in the dictionary; e.g., friendly is high for sociability; unfriendly is low. Coded as NA if word not in the corresponding dictionary
fasttext: variables starting in fasttext are the word embedding dimensions for Fasttext trained on 2 million word vectors trained with subword information on Common Crawl (https://fasttext.cc/docs/en/english-vectors.html)
Glove: variables starting in Glove are the word embedding dimensions for Glove trained on Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors; https://nlp.stanford.edu/projects/glove/) (https://fasttext.cc/docs/en/english-vectors.html)
Word2vec: variables starting in Word2vec are the word embedding dimensions for Word2vec trained Google News (https://code.google.com/archive/p/word2vec/)