check_textcorpus_data: Check if Data Meets 'textcorpus' Standards

Description Usage Arguments Examples

Description

check_textcorpus_data - Ensure a corpus data set meets textcorpus standards. The data is expected to contain c('id', 'author', 'text') columns. The "text" column must be a character vector. Non-ASCII characters are not allowed in any columns.

check_textcorpus_meta_data - Ensure a corpus meta data set meets textcorpus standards. The data is expected to contain an 'id' column at minimum. Other 'id' level meta data are accepted as well. The ‘id' column must match the class of 'x'’s 'id' column. Non-ASCII characters are not allowed in any columns.

Usage

1
2
3

Arguments

x

The main corpus data set.

meta

The meta data set.

...

ignored.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
library(dplyr)
library(stringi)

corp <- data_frame(
    id = stri_rand_strings(10, 10),
    author = c('sam', 'cal', 'sue', 'bob', 'sal', 'pam', 'pat', 'joe', 'arr', 'nmr'),
    text = stri_rand_lipsum(10),
    state = state.name[1:10],
    month = month.name[1:10]
)

check_textcorpus_data(corp)
check_textcorpus_meta_data(corp, corp[c('id', 'state', 'month')])

trinker/textcorpus documentation built on June 1, 2019, 12:53 a.m.