count.words: Word Frequency in a Text File

View source: R/count.words.R

count.wordsR Documentation

Word Frequency in a Text File

Description

Simple way to count how many times each word appears in a text file.

Usage

count.words(
  file,
  wordclump = 1,
  ignore.case = TRUE,
  stopwords = "",
  string,
  numbers.keep = TRUE,
  ...
)

Arguments

file

Character string filename, with or without path, for text file to be analyzed. Words assumed to be separated by spaces.

wordclump

number of words per clump, so if wordclump=2, it counts how often each 2-word phrase appears.

ignore.case

Logical, default TRUE which means not case-sensitive.

stopwords

Vector of words to ignore and not count. Default is none, optional.

string

A single character string containing text to analyze. Not yet implemented.

numbers.keep

Not yet implemented. Would ignore numbers.

...

Any other parameters used by scan() may be passed through. See http://stat.ethz.ch/R-manual/R-devel/library/base/html/scan.html

Value

Returns a data.frame with term (term) and frequencies (freq) sorted by frequency, showing the number of times a given word appears in the file. The rownames are also the words found.

Examples

## Not run: 
  counts <- count.words('speech.txt'); tail(counts, 15)
counts <- count.words('speech.txt', ignore.case=FALSE); head(counts[order(counts$term), ], 15)
counts <- count.words('speech.txt', stopwords=c('The', 'the', 'And', 'and', 'A', 'a'))
tail(counts, 15)
counts <- count.words('speech.txt', 3); tail(counts, 30)
#
counts['the', ]
counts[c('the', 'and', 'notfoundxxxxx'), ] # works only if you are sure all are found
counts[rownames(counts) %in% c('the', 'and', 'notfoundxxxxx'), ]
  # that works even if specified word wasn't found
counts[counts$term %in% c('the', 'and', 'notfoundxxxxx'), ]
  # that works even if specified word wasn't found
counts <- count.words('C:/mypath/speech.txt')
counts <- count.words('speech.txt', sep='.')
  # that is for whole sentences (sort of - splits up at decimal places as well)

## End(Not run)

ejanalysis/analyze.stuff documentation built on April 2, 2024, 10:10 a.m.