searchCorpus: Simple example function to extract data from the corpus

Description Usage Arguments Value

Description

There is a real risk of overflowing your computer's memory if you search for a frequent pattern in the files. If you specify an output directory, the results out kept out of memory. Linux/Mac users may want to use the command line tool 'grep' instead of this function, as it will be much faster. e.g. grep [PATTERN] corpus_directory/*csv > outputfile.csv. Information on POS and dependency labels:http://universaldependencies.org/format.html.

Usage

1
2
searchCorpus(pattern, corpus_directory, output_directory = NULL,
  field = c("tagged", "text"))

Arguments

pattern

word or regular expression

corpus_directory

directory where you keep the corpus file

output_directory

directory where you want the output files to go. Keep to NULL (default) if you want to collect the results in an R object rather than writing them out to files

field

one of 'tagged' or 'text'. If you want to search the parsed data, opt for 'tagged', otherwise, opt for 'text'

Value

Either a data.frame or nothing


jeroenclaes/tweetCorp documentation built on May 27, 2019, 4:50 a.m.