Description Usage Arguments Details Value Author(s) See Also Examples
Linear or nonlinear penalized regression of any dependent variable on the wide number of sentiment measures and potentially other explanatory variables. Either performs a regression given the provided variables at once, or computes regressions sequentially for a given sample size over a longer time horizon, with associated prediction performance metrics.
1 | sento_model(sento_measures, y, x = NULL, ctr)
|
sento_measures |
a |
y |
a one-column |
x |
a named |
ctr |
output from a |
Models are computed using the elastic net regularization as implemented in the glmnet package, to account for
the multidimensionality of the sentiment measures. Independent variables are normalized in the regression process, but
coefficients are returned in their original space. For a helpful introduction to glmnet, we refer to their
vignette. The optimal elastic net parameters
lambda
and alpha
are calibrated either through a to specify information criterion or through
cross-validation (based on the "rolling forecasting origin" principle, using the train
function).
In the latter case, the training metric is automatically set to "RMSE"
for a linear model and to "Accuracy"
for a logistic model. We suppress many of the details that can be supplied to the glmnet
and
train
functions we rely on, for the sake of user-friendliness.
If ctr$do.iter = FALSE
, a sento_model
object which is a list
containing:
reg |
optimized regression, i.e., a model-specific glmnet object, including for example the estimated coefficients. |
model |
the input argument |
alpha |
calibrated alpha. |
lambda |
calibrated lambda. |
trained |
output from |
ic |
a |
dates |
sample reference dates as a two-element |
nVar |
a vector of size two, with respectively the number of sentiment measures, and the number of other explanatory variables inputted. |
discarded |
a named |
If ctr$do.iter = TRUE
, a sento_modelIter
object which is a list
containing:
models |
all sparse regressions, i.e., separate |
alphas |
calibrated alphas. |
lambdas |
calibrated lambdas. |
performance |
a |
Samuel Borms, Keven Bluteau
ctr_model
, glmnet
, train
, attributions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | ## Not run:
data("usnews", package = "sentometrics")
data("list_lexicons", package = "sentometrics")
data("list_valence_shifters", package = "sentometrics")
data("epu", package = "sentometrics")
set.seed(505)
# construct a sento_measures object to start with
corpusAll <- sento_corpus(corpusdf = usnews)
corpus <- quanteda::corpus_subset(corpusAll, date >= "2004-01-01")
l <- sento_lexicons(list_lexicons[c("LM_en", "HENRY_en")])
ctr <- ctr_agg(howWithin = "counts", howDocs = "proportional",
howTime = c("equal_weight", "linear"),
by = "month", lag = 3)
sento_measures <- sento_measures(corpus, l, ctr)
# prepare y and other x variables
y <- epu[epu$date %in% get_dates(sento_measures), "index"]
length(y) == nobs(sento_measures) # TRUE
x <- data.frame(runif(length(y)), rnorm(length(y))) # two other (random) x variables
colnames(x) <- c("x1", "x2")
# a linear model based on the Akaike information criterion
ctrIC <- ctr_model(model = "gaussian", type = "AIC", do.iter = FALSE, h = 4,
do.difference = TRUE)
out1 <- sento_model(sento_measures, y, x = x, ctr = ctrIC)
# attribution and prediction as post-analysis
attributions1 <- attributions(out1, sento_measures,
refDates = get_dates(sento_measures)[20:25])
plot(attributions1, "features")
nx <- nmeasures(sento_measures) + ncol(x)
newx <- runif(nx) * cbind(data.table::as.data.table(sento_measures)[, -1], x)[30:40, ]
preds <- predict(out1, newx = as.matrix(newx), type = "link")
# an iterative out-of-sample analysis, parallelized
ctrIter <- ctr_model(model = "gaussian", type = "BIC", do.iter = TRUE, h = 3,
oos = 2, alphas = c(0.25, 0.75), nSample = 75, nCore = 2)
out2 <- sento_model(sento_measures, y, x = x, ctr = ctrIter)
summary(out2)
# plot predicted vs. realized values
p <- plot(out2)
p
# a cross-validation based model, parallelized
cl <- parallel::makeCluster(2)
doParallel::registerDoParallel(cl)
ctrCV <- ctr_model(model = "gaussian", type = "cv", do.iter = FALSE,
h = 0, alphas = c(0.10, 0.50, 0.90), trainWindow = 70,
testWindow = 10, oos = 0, do.progress = TRUE)
out3 <- sento_model(sento_measures, y, x = x, ctr = ctrCV)
parallel::stopCluster(cl)
foreach::registerDoSEQ()
summary(out3)
# a cross-validation based model for a binomial target
yb <- epu[epu$date %in% get_dates(sento_measures), "above"]
ctrCVb <- ctr_model(model = "binomial", type = "cv", do.iter = FALSE,
h = 0, alphas = c(0.10, 0.50, 0.90), trainWindow = 70,
testWindow = 10, oos = 0, do.progress = TRUE)
out4 <- sento_model(sento_measures, yb, x = x, ctr = ctrCVb)
summary(out4)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.