README.md
In tm4ss/tmca.classify: Classification and active learning for text (tmca.classify)

tmca.classify

R package for proportional classification and active learning

Two experiments are provided:

Sentiment Analysis: demo(active_learning_sentiments)
Party Manifestos: demo(active_learning_manifestos)

The first experiment conducts repeated active learning of sentiment classes to learn about stability of the process and influence of parameters (e.g. initial training size).

The second experiment conducts active learning of party manifesto categories. It employs ngram features and LDA features. Feature extraction may take a while depending on your hardware. Performance of the process is evaluated by predicting category proportions in party manifestos correctly.

For details about the example data, see the README in the data folder.

To conduct an own active learning, here is some example code:

library("tmca.classify")

# active learning examples for sentiments
data("sentiments")
corpus <- sentiment_data$SENTENCE
labels <- factor(rep(NA, length(corpus)), levels = c("Positive", "Negative"))
my_classification <- tmca_classify(corpus = corpus, labels = labels)
my_classification$create_initial_trainingset(50)
my_classification$active_learning(stop_threshold = 0.99, positive_class = "Positive", max_iterations = 10)
my_classification$plot_progress()
final_labels <- my_classification$classify()

# proportions
table(final_labels) / length(final_labels)
tmca_proportions(final_labels, sentiment_data$LABEL, facets = sentiment_data$SOURCE, positive_class = "Positive")

Since gold labels for the example data are known, we can compare predicted and true proportions for the different review sources (Yelp, Amazon, Imdb).

tm4ss/tmca.classify documentation built on June 24, 2019, 12:37 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com