unnest_sentences: Wrapper around unnest_tokens for sentences, lines, and...
In insightdataintel/tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Description Usage Arguments See Also Examples

These functions are wrappers around unnest_tokens( token = "sentences" ) unnest_tokens( token = "lines" ) and unnest_tokens( token = "paragraphs" ).

unnest_sentences(
  tbl,
  output,
  input,
  strip_punct = FALSE,
  format = c("text", "man", "latex", "html", "xml"),
  to_lower = TRUE,
  drop = TRUE,
  collapse = NULL,
  ...
)

unnest_lines(
  tbl,
  output,
  input,
  format = c("text", "man", "latex", "html", "xml"),
  to_lower = TRUE,
  drop = TRUE,
  collapse = NULL,
  ...
)

unnest_paragraphs(
  tbl,
  output,
  input,
  paragraph_break = "\n\n",
  format = c("text", "man", "latex", "html", "xml"),
  to_lower = TRUE,
  drop = TRUE,
  collapse = NULL,
  ...
)

`tbl`	A data frame
`output`	Output column to be created as string or symbol.
`input`	Input column that gets split as string or symbol. The output/input arguments are passed by expression and support quasiquotation; you can unquote strings and symbols.
`strip_punct`	Should punctuation be stripped?
`format`	Either "text", "man", "latex", "html", or "xml". If not text, this uses the hunspell tokenizer, and can tokenize only by "word"
`to_lower`	Whether to convert tokens to lowercase. If tokens include URLS (such as with `token = "tweets"`), such converted URLs may no longer be correct.
`drop`	Whether original input column should get dropped. Ignored if the original input and new output column have the same name.
`collapse`	Whether to combine text with newlines first in case tokens (such as sentences or paragraphs) span multiple lines. If NULL, collapses when token method is "ngrams", "skip_ngrams", "sentences", "lines", "paragraphs", or "regex".
`...`	Extra arguments passed on to tokenizers
`paragraph_break`	A string identifying the boundary between two paragraphs.

unnest_tokens()

library(dplyr)
library(janeaustenr)

d <- tibble(txt = prideprejudice)

d %>%
  unnest_sentences(word, txt)

insightdataintel/tidytext documentation built on Aug. 23, 2020, 12:44 a.m.

insightdataintel/tidytext index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

insightdataintel/tidytext
Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

unnest_sentences: Wrapper around unnest_tokens for sentences, lines, and...
In insightdataintel/tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Description

Usage

Arguments

See Also

Examples

Related to unnest_sentences in insightdataintel/tidytext...

R Package Documentation

Browse R Packages

We want your feedback!

insightdataintel/tidytext Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

unnest_sentences: Wrapper around unnest_tokens for sentences, lines, and... In insightdataintel/tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Description

Usage

Arguments

See Also

Examples

Related to unnest_sentences in insightdataintel/tidytext...

R Package Documentation

Browse R Packages

We want your feedback!

insightdataintel/tidytext
Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

unnest_sentences: Wrapper around unnest_tokens for sentences, lines, and...
In insightdataintel/tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools