plot.sample.size: Plot Classification Accuracy for Short Text Samples

View source: R/plot.sample.size.R

plot.sample.sizeR Documentation

Plot Classification Accuracy for Short Text Samples

Description

Plotting method for objects of the class "stylo.results", produced by the function samplesize.penalize. It can be used to show the behavior of short samples in text classification. See the help page of samplesize.penalize for further details.

Usage

## S3 method for class 'sample.size'
plot(x, target = NULL, variable = "diversity",
      trendline = TRUE, observations = FALSE,
      grayscale = FALSE, legend = TRUE, 
      legend_pos = "bottomright", main = "default", ...)

Arguments

x

an object of class "stylo.results" as produced by the function samplesize.penalize.

target

the number of the text to be plotted, or its name as stored in the "stylo.results" object (see the examples below). Both ways are equivalent, where a numeric value represents the n-th text. If no target is specified, then the first text is plotted.

variable

choose either "accuracy" to get the classification accuracy, i.e. the ratio of correctly attributed instances to the number iterations (usually 100, see the help page of samplesize.penalize for further details), or "diversity" to get Simpson's index of class imbalance (this is the default value). The index provides you with the information how consistent was a classifier in its choices.

trendline

since all the observations represented in the plot might be difficult to read, one can use a trendline instead (default). The trendlines are produced using the generic lowess function.

observations

particular observations and a trendline (see above) can be combined. Switch this option on, to do so (default: FALSE).

grayscale

using this option, you can switch off colors.

legend

do you want to have the trendlines and/or observations explained? Switch this option on (which is default).

legend_pos

position of the legend: choose between "bottomright", "bottomleft", "topright" and "topleft".

main

title of the plot; use it as if it was a regular option of the function plot, or leave it as "default" to get the name of the sample as automatically extracted from the class "stylo.results".

...

further arguments to be passed to plot.

Details

An object generated by the samplesize.penalize function can be of course split into its parts and plotted using any other routine. The method discussed in this document is a simple shortcut: rather than refine your plot parameters from scratch, you can get acceptable results by using one single generic function plot; see a few examples below.

Author(s)

Maciej Eder

See Also

samplesize.penalize

Examples

## Not run: 
# provided that there exists a text collection (text files)
# in the subdirectory 'corpus', perform a test for sample size:
results = samplesize.penalize(corpus.dir = "corpus")

# then plot the first text's classification accuracy:
plot(results)

# plot the results, e.g. for the 5th text:
plot(results, target = 5)

# the 'target' parameter can be set via the text's name, 
# to see which texts are available in the results, type: 
results$test.texts

# plot Simpson's diversity index for the text named 'Woolf_Years_1937':
plot(results_classic, target = "Woolf_Years_1937", variable = "diversity")


## End(Not run)

stylo documentation built on May 29, 2024, 1:37 a.m.