PreProcess: Pre-process texts to create a corpus suitable for...

Description Usage Arguments Value Source

Description

PreProcess prepares texts and author information for use with ExpAgendaVonmon.

Usage

1
2
3
4
  PreProcess(textsDF = NULL, TextsCol, AuthorCol, IDCol,
    textsPattern = NULL, authorsDF = NULL,
    removeNumbers = TRUE, StopWords = NULL,
    removeAuthors = NULL, sparse = 0.4)

Arguments

textsDF

a data frame containing a column with texts and a column with author names. Unnecessary if textsDir and authorsDF are set.

TextsCol

character string identifying the column in textsDF with the texts.

AuthorCol

character string identifying the column in either textsDF or authorDF identifying the authors.

IDCol

a character string with the column uniquely identifying each text either in textsDF or authorDF.

textsPattern

character string. Regular expression pattern identifying the texts in textsDF. nnecessary if textDF is set.

authorsDF

a data frame with author information for each text in textDF. They must be in the same order. Unnecessary if textDF is set.

removeNumbers

logical. Whether or not to remove numbers from the texts.

StopWords

character vector of stop words to remove. If StopWords = NULL (the default) then tm's default English stop word list will be used. See stopwords.

removeAuthors

character vector. The names of authors to remove.

sparse

numeric for the maximal allowed sparsity. See removeSparseTerms

Value

Returns an object of class ExpAgendaDTMatrix that can be used with ExpAgendaVonmon to estimated authors' expressed agendas in documents. The object contains three matrices. doc.term is a document term matrix and authors locates the authors of the texts in doc.term. authorID is used for DocTopics to return the documents their their original order.

Source

Feinerer, K. Hornik, and D. Meyer. Text mining infrastructure in R. Journal of Statistical Software, 25(5):1-54, March 2008. http://www.jstatsoft.org/v25/i05.


christophergandrud/ExpAgenda documentation built on May 13, 2019, 7:01 p.m.