View source: R/textmodel_lss.R
| textmodel_lss | R Documentation |
Latent Semantic Scaling (LSS) is a semi-supervised algorithm for document scaling based on word embedding.
textmodel_lss(x, ...)
## S3 method for class 'dfm'
textmodel_lss(
x,
seeds,
terms = NULL,
k = 300,
slice = NULL,
weight = "count",
cache = FALSE,
simil_method = "cosine",
engine = c("RSpectra", "irlba", "rsvd"),
auto_weight = FALSE,
include_data = FALSE,
group_data = FALSE,
verbose = FALSE,
...
)
## S3 method for class 'fcm'
textmodel_lss(
x,
seeds,
terms = NULL,
k = 50,
max_count = 10,
weight = "count",
cache = FALSE,
simil_method = "cosine",
engine = "rsparse",
auto_weight = FALSE,
verbose = FALSE,
...
)
## S3 method for class 'tokens'
textmodel_lss(
x,
seeds,
terms = NULL,
k = 200,
min_count = 5,
engine = "wordvector",
tolower = TRUE,
include_data = FALSE,
group_data = FALSE,
spatial = TRUE,
verbose = FALSE,
...
)
x |
a dfm or fcm created by |
... |
additional arguments passed to the underlying engine. |
seeds |
a character vector or named numeric vector that contains seed words. If seed words contain "*", they are interpreted as glob patterns. See quanteda::valuetype. |
terms |
a character vector or named numeric vector that specify words
for which polarity scores will be computed; if a numeric vector, words' polarity
scores will be weighted accordingly; if |
k |
the number of singular values requested to the SVD engine. Only used
when |
slice |
a number or indices of the components of word vectors used to
compute similarity; |
weight |
weighting scheme passed to |
cache |
if |
simil_method |
specifies method to compute similarity between features.
The value is passed to |
engine |
select the engine to factorize |
auto_weight |
automatically determine weights to approximate the polarity of terms to seed words. Deprecated. |
include_data |
if |
group_data |
if |
verbose |
show messages if |
max_count |
passed to |
min_count |
the minimum frequency of the words. Words less frequent than
this in |
tolower |
if |
spatial |
[experimental] if |
Latent Semantic Scaling (LSS) is a semisupervised document scaling
method. textmodel_lss() constructs word vectors from use-provided
documents (x) and weights words (terms) based on their semantic
proximity to seed words (seeds). Seed words are any known polarity words
(e.g. sentiment words) that users should manually choose. The required
number of seed words are usually 5 to 10 for each end of the scale.
If seeds is a named numeric vector with positive and negative values, a
bipolar model is construct; if seeds is a character vector, a
unipolar model. Usually bipolar models perform better in document
scaling because both ends of the scale are defined by the user.
A seed word's polarity score computed by textmodel_lss() tends to diverge
from its original score given by the user because it's score is affected
not only by its original score but also by the original scores of all other
seed words. If auto_weight = TRUE, the original scores are weighted
automatically using stats::optim() to minimize the squared difference
between seed words' computed and original scores. Weighted scores are saved
in seed_weighted in the object.
When x is a tokens or tokens_xptr object, wordvector::textmodel_word2vec
is called internally with type = "skip-gram" and other arguments passed via ....
If spatial = TRUE, it return a spatial model; otherwise a probabilistic model.
While the polarity scores of words are their cosine similarity to seed words in
spatial models, they are predicted probability that the seed words to occur in
their contexts. The probabilistic models are still experimental, so use them with caution.
Please visit the package website for examples.
Watanabe, Kohei. 2020. "Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages", Communication Methods and Measures. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/19312458.2020.1832976")}.
Watanabe, Kohei. 2017. "Measuring News Bias: Russia's Official News Agency ITAR-TASS' Coverage of the Ukraine Crisis" European Journal of Communication. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/0267323117695735")}.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.