preText: preText: Diagnostics to Assess The Effects of Text...

Description Usage Arguments Value preText functions Examples

Description

preText: Diagnostics to Assess The Effects of Text Preprocessing Decisions

Calculates preText scores for each preprocessing specification.

Usage

1
2
3
preText(preprocessed_documents, dataset_name = "Documents",
  distance_method = "cosine", num_comparisons = 50, parallel = FALSE,
  cores = 1, verbose = TRUE)

Arguments

preprocessed_documents

A list object generated by the 'factorial_preprocessing()' function.

dataset_name

A string indicating the name to be associated with the results. Defaults to "Documents".

distance_method

The method that should be used for calculating document distances. Defaults to "cosine".

num_comparisons

If method = "distribution", the number of ranks to use in calculating average difference. Defaults to 50.

parallel

Logical indicating whether factorial preprocessing should be performed in parallel. Defaults to FALSE.

cores

Defaults to 1, can be set to any number less than or equal to the number of cores on one's computer

verbose

Logical indicating whether more information should be printed to the screen to let the user know about progress. Defaults to TRUE.

Value

A result list object.

preText functions

To use this package, You will first want to check out the factorial_preprocessing() function which will take raw data and transform it into document-frequency matrices using a factorial design and 6-7 different preprocessing decisions. The next step in most applications will be to run the preText() function, which will generate preText scores for each preprocessing specification. These can then be fed to the preText_score_plot() and regression_coefficient_plot() functions to generate interpretable output. For more information on additional functions check out the GitHub README for this package (https://github.com/matthewjdenny/preText) or the "getting started" vignette by typing 'vignette("getting_started_with_preText")' into the console.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 
# load the package
library(preText)
# load in the data
data("UK_Manifestos")
# preprocess data
preprocessed_documents <- factorial_preprocessing(
    UK_Manifestos,
    use_ngrams = TRUE,
    infrequent_term_threshold = 0.02,
    verbose = TRUE)
# run preText
preText_results <- preText(
    preprocessed_documents,
    dataset_name = "Inaugural Speeches",
    distance_method = "cosine",
    num_comparisons = 100,
    verbose = TRUE)

## End(Not run)

preText documentation built on May 1, 2019, 8:27 p.m.