Stopword_Maker: For the finding of the $N$ most populous words in a corpus.

Description Usage Arguments Value Author(s) Examples

View source: R/Personal_Functions.R

Description

This function finds the $N$ most used words in a corpus. This is done to identify stop words to better prune data sets before training.

Usage

1
Stopword_Maker(titles, cutoff = 20)

Arguments

titles

The documents in which the most populous words are sought.

cutoff

The number of $N$ top most used words to keep as stop words.

Value

output

A vector of the $N$ most populous words.

Author(s)

Travis Barton

Examples

1
2
3
4
5
test_set = c('this is a testset', 'I am searching for a list of words',
'I like turtles',
'A rocket would be a fast way of getting to work, but I do not think it is very practical')
res = Stopword_Maker(test_set, 4)
print(res)

LilRhino documentation built on Oct. 31, 2019, 4:59 p.m.