plot.sample.size: Plot Classification Accuracy for Short Text Samples
In stylo: Stylometric Multivariate Analyses

plot.sample.size

R Documentation

Plot Classification Accuracy for Short Text Samples

Description

Plotting method for objects of the class "stylo.results", produced by the function samplesize.penalize. It can be used to show the behavior of short samples in text classification. See the help page of samplesize.penalize for further details.

Usage

## S3 method for class 'sample.size'
plot(x, target = NULL, variable = "diversity",
      trendline = TRUE, observations = FALSE,
      grayscale = FALSE, legend = TRUE, 
      legend_pos = "bottomright", main = "default", ...)

Arguments

`x`	an object of class `"stylo.results"` as produced by the function `samplesize.penalize`.
`target`	the number of the text to be plotted, or its name as stored in the `"stylo.results"` object (see the examples below). Both ways are equivalent, where a numeric value represents the n-th text. If no target is specified, then the first text is plotted.
`variable`	choose either `"accuracy"` to get the classification accuracy, i.e. the ratio of correctly attributed instances to the number iterations (usually 100, see the help page of `samplesize.penalize` for further details), or `"diversity"` to get Simpson's index of class imbalance (this is the default value). The index provides you with the information how consistent was a classifier in its choices.
`trendline`	since all the observations represented in the plot might be difficult to read, one can use a trendline instead (default). The trendlines are produced using the generic `lowess` function.
`observations`	particular observations and a trendline (see above) can be combined. Switch this option on, to do so (default: `FALSE`).
`grayscale`	using this option, you can switch off colors.
`legend`	do you want to have the trendlines and/or observations explained? Switch this option on (which is default).
`legend_pos`	position of the legend: choose between `"bottomright"`, `"bottomleft"`, `"topright"` and `"topleft"`.
`main`	title of the plot; use it as if it was a regular option of the function `plot`, or leave it as `"default"` to get the name of the sample as automatically extracted from the class `"stylo.results"`.
`...`	further arguments to be passed to `plot`.

Details

An object generated by the samplesize.penalize function can be of course split into its parts and plotted using any other routine. The method discussed in this document is a simple shortcut: rather than refine your plot parameters from scratch, you can get acceptable results by using one single generic function plot; see a few examples below.

Author(s)

Maciej Eder

Examples

## Not run: 
# provided that there exists a text collection (text files)
# in the subdirectory 'corpus', perform a test for sample size:
results = samplesize.penalize(corpus.dir = "corpus")

# then plot the first text's classification accuracy:
plot(results)

# plot the results, e.g. for the 5th text:
plot(results, target = 5)

# the 'target' parameter can be set via the text's name, 
# to see which texts are available in the results, type: 
results$test.texts

# plot Simpson's diversity index for the text named 'Woolf_Years_1937':
plot(results_classic, target = "Woolf_Years_1937", variable = "diversity")


## End(Not run)

stylo documentation built on May 29, 2024, 1:37 a.m.