textstat_frequency: Tabulate feature frequencies
In quanteda.textstats: Textual Statistics for the Quantitative Analysis of Textual Data

textstat_frequency

R Documentation

Tabulate feature frequencies

Description

Produces counts and document frequencies summaries of the features in a dfm, optionally grouped by a docvars variable or other supplied grouping variable.

Usage

textstat_frequency(
  x,
  n = NULL,
  groups = NULL,
  ties_method = c("min", "average", "first", "random", "max", "dense"),
  ...
)

Arguments

`x`	a dfm object
`n`	(optional) integer specifying the top `n` features to be returned, within group if `groups` is specified
`groups`	grouping variable for sampling, equal in length to the number of documents. This will be evaluated in the docvars data.frame, so that docvars may be referred to by name without quoting. This also changes previous behaviours for `groups`. See `news(Version >= "3.0", package = "quanteda")` for details.
`ties_method`	character string specifying how ties are treated. See `base::rank()` for details. Unlike that function, however, the default is `"min"`, so that frequencies of 10, 10, 11 would be ranked 1, 1, 3.
`...`	additional arguments passed to dfm_group(). This can be useful in passing `force = TRUE`, for instance, if you are grouping a dfm that has been weighted.

Value

a data.frame containing the following variables:

feature: (character) the feature
frequency: count of the feature
rank: rank of the feature, where 1 indicates the greatest frequency
docfreq: document frequency of the feature, as a count (the number of documents in which this feature occurred at least once)
docfreq: document frequency of the feature, as a count
group: (only if groups is specified) the label of the group. If the features have been grouped, then all counts, ranks, and document frequencies are within group. If groups is not specified, the group column is omitted from the returned data.frame.

textstat_frequency returns a data.frame of features and their term and document frequencies within groups.

Examples

library("quanteda")
set.seed(20)
dfmat1 <- dfm(tokens(c("a a b b c d", "a d d d", "a a a")))

textstat_frequency(dfmat1)
textstat_frequency(dfmat1, groups = c("one", "two", "one"), ties_method = "first")
textstat_frequency(dfmat1, groups = c("one", "two", "one"), ties_method = "average")

dfmat2 <- corpus_subset(data_corpus_inaugural, President == "Obama") %>%
   tokens(remove_punct = TRUE) %>%
   tokens_remove(stopwords("en")) %>%
   dfm()
tstat1 <- textstat_frequency(dfmat2)
head(tstat1, 10)

dfmat3 <- head(data_corpus_inaugural) %>%
   tokens(remove_punct = TRUE) %>%
   tokens_remove(stopwords("en")) %>%
   dfm()
textstat_frequency(dfmat3, n = 2, groups = President)


## Not run: 
# plot 20 most frequent words
library("ggplot2")
ggplot(tstat1[1:20, ], aes(x = reorder(feature, frequency), y = frequency)) +
    geom_point() +
    coord_flip() +
    labs(x = NULL, y = "Frequency")

# plot relative frequencies by group
dfmat3 <- data_corpus_inaugural %>%
    corpus_subset(Year > 2000) %>%
    tokens(remove_punct = TRUE) %>%
    tokens_remove(stopwords("en")) %>%
    dfm() %>%
    dfm_group(groups = President) %>%
    dfm_weight(scheme = "prop")

# calculate relative frequency by president
tstat2 <- textstat_frequency(dfmat3, n = 10, groups = President)

# plot frequencies
ggplot(data = tstat2, aes(x = factor(nrow(tstat2):1), y = frequency)) +
    geom_point() +
    facet_wrap(~ group, scales = "free") +
    coord_flip() +
    scale_x_discrete(breaks = nrow(tstat2):1,
                       labels = tstat2$feature) +
    labs(x = NULL, y = "Relative frequency")

## End(Not run)

quanteda.textstats documentation built on Sept. 11, 2024, 6:39 p.m.

quanteda.textstats index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

quanteda.textstats
Textual Statistics for the Quantitative Analysis of Textual Data

textstat_frequency: Tabulate feature frequencies
In quanteda.textstats: Textual Statistics for the Quantitative Analysis of Textual Data

Tabulate feature frequencies

Description

Usage

Arguments

Value

Examples

Related to textstat_frequency in quanteda.textstats...

R Package Documentation

Browse R Packages

We want your feedback!

quanteda.textstats Textual Statistics for the Quantitative Analysis of Textual Data

textstat_frequency: Tabulate feature frequencies In quanteda.textstats: Textual Statistics for the Quantitative Analysis of Textual Data

Tabulate feature frequencies

Description

Usage

Arguments

Value

Examples

Related to textstat_frequency in quanteda.textstats...

R Package Documentation

Browse R Packages

We want your feedback!

quanteda.textstats
Textual Statistics for the Quantitative Analysis of Textual Data

textstat_frequency: Tabulate feature frequencies
In quanteda.textstats: Textual Statistics for the Quantitative Analysis of Textual Data