bind_tf_idf: Bind the term frequency and inverse document frequency of a...

Description Usage Arguments Details Examples

View source: R/bind_tf_idf.R

Description

Calculate and bind the term frequency and inverse document frequency of a tidy text dataset, along with the product, tf-idf to the dataset. Each of these values are added as columns.

Usage

1
2
3
bind_tf_idf(tbl, term_col, document_col, n_col)

bind_tf_idf_(tbl, term_col, document_col, n_col)

Arguments

tbl

A tidy text dataset with one-row-per-term-per-document

term_col

Column containing terms

document_col

Column containing document IDs

n_col

Column containing document-term counts

Details

tf_idf is given bare names, while tf_idf_ is given strings and is therefore suitable for programming with.

If the dataset is grouped, the groups are ignored but are retained.

The dataset must have exactly one row per document-term combination for this to work.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(dplyr)
library(janeaustenr)

book_words <- austen_books() %>%
  unnest_tokens(word, text) %>%
  count(book, word, sort = TRUE) %>%
  ungroup()

book_words

# find the words most distinctive to each document
book_words %>%
  bind_tf_idf(word, book, n) %>%
  arrange(desc(tf_idf))


tidytext documentation built on May 19, 2017, 1:49 p.m.
Search within the tidytext package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs in the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.