View source: R/classify_tweets.R
classify_tweets | R Documentation |
Function takes a data frame of tweet features as input, and obtains a prediction for each sample using an ensemble classifier.
classify_tweets(
x,
model = ensemble.model,
na.rm = TRUE,
threshold = 0.5,
...,
.predict.type = "prob",
.add = FALSE,
.verbose = TRUE,
.debug = FALSE
)
## Default S3 method:
classify_tweets(
x,
model,
na.rm = TRUE,
threshold = 0.5,
...,
.predict.type = "prob",
.add = FALSE,
.verbose = TRUE,
.debug = FALSE
)
## S3 method for class 'caretEnsemble'
classify_tweets(
x,
model = ensemble.model,
na.rm = TRUE,
threshold = 0.5,
...,
.predict.type = "prob",
.add = FALSE,
.verbose = TRUE,
.debug = FALSE
)
## S3 method for class 'caretList'
classify_tweets(
x,
model = constituent.models,
na.rm = TRUE,
threshold = 0.5,
blend.by = "PR-AUC",
.train.ctrl = trainControl(method = "repeatedcv", number = 10, repeats = 10, search =
"grid", returnData = FALSE, returnResamp = "none", savePredictions = "none",
classProbs = TRUE, summaryFunction = superSumFun, allowParallel = TRUE),
...,
.predict.type = "prob",
.add = FALSE,
.verbose = TRUE,
.debug = FALSE,
.cache.model = FALSE,
.cache.path = getOption("politicaltweets.cache.path")
)
x |
a data frame object of tweet features/predictor variables |
model |
Either
Defaults to the 'caretEnsemble' object |
na.rm |
logical. List-wise remove rows with missings?
If |
threshold |
a unit-length double vector in (0, 1), specifying the (predicted) probability threshold used to classify samples as positive (i.e., "political") instances. |
... |
Additional arguments passed to specific method and |
.predict.type |
a unit-length character string, either "prob" (obtain predicted probabilities, the default) or or "raw" (obtain predicted classes) |
.add |
logical: Column-bind (add) predictions to |
.verbose |
logical. Print messages to console informing about what the function is doing. |
.debug |
logical. Defaults to |
blend.by |
a unit-length character string determining the evaluation metric based on which constituent models (base learners) should be blended into the ensemble classifier (see section "Ensemble classifier") |
.train.ctrl |
a list object created by calling |
.cache.model |
logical. Cache ensemble classifiers
obtained from |
.cache.path |
unit-length character, specifying where to write cached ensemble classifiers
if |
classify_tweets
can handle two types of model input:
Lists of pre-trained base learner models:
This is the default behavior if the input to argument model
is a 'caretList' object (i.e., a list of pre-trained base learners).
In this case, the base learners are first "blended" into a greedy ensemble
classifier, and the resulting ensemble model is then used to classify
samples in x
.
Pre-trained ensemble classifiers: If the input to argument model
is a 'caretEnsemble' object, this ensemble model is directly used to classify
samples in x
.
A data frame of predictions.
Check attribute "removed.rows" for indexes of removed rows
and "removed.rows.nas" for corresponding missing value information
if na.rm = TRUE
.
default
: Default method (when model
is neither a 'caretList' or 'caretEnsemble' object)
caretEnsemble
: Method when model
is a 'caretEnsemble' object
(i.e., a pre-trained ensemble model)
caretList
: Method when model
is a 'caretList' object (i.e., a list of pre-trained base learner models)
classify_tweets
when model
is a 'caretList' objectBy default, four constituent models are used to create the ensemble classifier
(see ?constituent.models
):
glmnet
: a generalized linear model (GLM) with Elastic-Net regularization (glmnet
)
svmRadial
: a Support Vector Machine (SVM) with a radial kernel (ksvm
with kernel = "rbfdot"
)
ranger
: a Random Forest (ranger
)
xgbTree
: an eXtreme Gradient Boosting (XGBoost) machine (xgboost
with learner = "tree"
)
The ensemble classifier is obtain by "blending" constituent models using a generalized linear model (GLM)
This is done by a call to the caretEnsemble
function
(see vignette("caretEnsemble-intro", package = "caretEnsemble")
).
The blend.by
determines which evaluation metric is used to "blend" constituent models.
It is passed to the metric
argument when calling caretEnsemble
, which, in turn,
forwards metric
to train
when training the GLM with method = "glm"
.
x
To classify samples in x
, the ensemble model is passed to
the object
argument when calling caretEnsemble
's predict
method.
By default (.predict.type = "prob"
), predicted probabilities for
the "yes" (political) class are obtained, and a classification into
"yes" and "no" is induced based on the threshold
(default is .5).
That is, all samples with a predicted probability ≥ threshold
are classified as "yes" instances.
Alternatively, you can directly obtain an assignment into classes by
setting .predict.type = "raw"
.
CAUTION: In the latter case, threshold
will have no effect,
and the default threshold of .5 is always used.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.