prep_corpus: Create a database of text and metadata to feed into...

Description Usage Arguments Value See Also Examples

Description

Create a database of text and metadata to feed into make_model

Usage

1
2
3
prep_corpus(xmlfolder, date_vec = date_vec, wordsToRemove = NULL,
  stemDoc = FALSE, pattern = ".tei.xml", journalVec = NULL,
  yearRangeRule = NULL)

Arguments

xmlfolder

string input folder with text files to run. Typically contains either XML files from the MJP or files cleaned with stripXML

date_vec

vector vector of dates, usually from strip_dates

wordsToRemove

vector a vector of words to remove from the corpus; an initial stopword list

stemDoc

logical if TRUE, runs SnowballC's stemDocument function

pattern

string pattern of the endings of the files of xmlfolder; defaults to '.tei.xml', but if xmlfolder is taken from stripXML, should be '.txt'

journalVec

vector a vector of the names of journals to be included. possibile examples are "Blast", "Egoist", "Poetry Magazine", "Freewoman", "NewFreewoman"

yearRangeRule

string the rule specifying which years are to be included. Can take values like "> 1900", "==1919", etc.

Value

a data.frame of text for running with MALLET

See Also

strip_dates and stripXML which this depends on, as well as make_model, which takes the data.frame created by this function as input

Examples

1
dataframe <- stripped_xmlData

mlinegar/litMagModelling documentation built on May 23, 2019, 2:12 a.m.