preText: preText: Diagnostics to Assess The Effects of Text...
In preText: Diagnostics to Assess the Effects of Text Preprocessing Decisions

Description Usage Arguments Value preText functions Examples

preText: Diagnostics to Assess The Effects of Text Preprocessing Decisions

Calculates preText scores for each preprocessing specification.

1
2
3

preText(preprocessed_documents, dataset_name = "Documents",
  distance_method = "cosine", num_comparisons = 50, parallel = FALSE,
  cores = 1, verbose = TRUE)

`preprocessed_documents`	A list object generated by the 'factorial_preprocessing()' function.
`dataset_name`	A string indicating the name to be associated with the results. Defaults to "Documents".
`distance_method`	The method that should be used for calculating document distances. Defaults to "cosine".
`num_comparisons`	If method = "distribution", the number of ranks to use in calculating average difference. Defaults to 50.
`parallel`	Logical indicating whether factorial preprocessing should be performed in parallel. Defaults to FALSE.
`cores`	Defaults to 1, can be set to any number less than or equal to the number of cores on one's computer
`verbose`	Logical indicating whether more information should be printed to the screen to let the user know about progress. Defaults to TRUE.

A result list object.

To use this package, You will first want to check out the factorial_preprocessing() function which will take raw data and transform it into document-frequency matrices using a factorial design and 6-7 different preprocessing decisions. The next step in most applications will be to run the preText() function, which will generate preText scores for each preprocessing specification. These can then be fed to the preText_score_plot() and regression_coefficient_plot() functions to generate interpretable output. For more information on additional functions check out the GitHub README for this package (https://github.com/matthewjdenny/preText) or the "getting started" vignette by typing 'vignette("getting_started_with_preText")' into the console.

## Not run: 
# load the package
library(preText)
# load in the data
data("UK_Manifestos")
# preprocess data
preprocessed_documents <- factorial_preprocessing(
    UK_Manifestos,
    use_ngrams = TRUE,
    infrequent_term_threshold = 0.02,
    verbose = TRUE)
# run preText
preText_results <- preText(
    preprocessed_documents,
    dataset_name = "Inaugural Speeches",
    distance_method = "cosine",
    num_comparisons = 100,
    verbose = TRUE)

## End(Not run)