Stopword_Maker: For the finding of the $N$ most populous words in a corpus.

View source: R/Personal_Functions.R

Stopword_MakerR Documentation

For the finding of the $N$ most populous words in a corpus.

Description

This function finds the $N$ most used words in a corpus. This is done to identify stop words to better prune data sets before training.

Usage

Stopword_Maker(titles, cutoff = 20)

Arguments

titles

The documents in which the most populous words are sought.

cutoff

The number of $N$ top most used words to keep as stop words.

Value

output

A vector of the $N$ most populous words.

Author(s)

Travis Barton

Examples

test_set = c('this is a testset', 'I am searching for a list of words',
'I like turtles',
'A rocket would be a fast way of getting to work, but I do not think it is very practical')
res = Stopword_Maker(test_set, 4)
print(res)

LilRhino documentation built on April 28, 2022, 1:06 a.m.