fst_ngrams_table2: Make Top N-grams Table 2

View source: R/02_data_exploration.R

fst_ngrams_table2R Documentation

Make Top N-grams Table 2

Description

Creates a table of the most frequently-occurring n-grams within the data. Optionally, weights can be provided either through a 'weight' column in the formatted data, or from a 'svydesign' object with the raw (preformatted) data. Equivalent to ‘fst_get_top_ngrams' but doesn’t print message about ties.

Usage

fst_ngrams_table2(
  data,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE
)

Arguments

data

A dataframe of text in CoNLL-U format, with optional additional columns.

number

The number of n-grams to return, default is '10'.

ngrams

The type of n-grams to return, default is '1'.

norm

The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses, default), '"number_resp"' (the number of responses), or 'NULL' (raw count returned).

pos_filter

List of UPOS tags for inclusion, default is 'NULL' which means all word types included.

strict

Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.

use_svydesign_weights

Option to weight words in the table using weights from a 'svydesign' containing the raw data, default is 'FALSE'

id

ID column from raw data, required if 'use_svydesign_weights = TRUE' and must match the 'docid' in formatted 'data'.

svydesign

A 'svydesign' which contains the raw data and weights, required if 'use_svydesign_weights = TRUE'.

use_column_weights

Option to weight words in the table using weights from formatted data which includes addition 'weight' column, default is 'FALSE'

Value

A table of the most frequently occurring n-grams in the data.

Examples

fst_ngrams_table2(fst_child, norm = NULL)
fst_ngrams_table2(fst_child, ngrams = 2, norm = "number_resp")
c <- fst_child_2
s <- survey::svydesign(id=~1, weights= ~paino, data = child)
i <- 'fsd_id'
T <- TRUE
fst_ngrams_table2(c, 10, 2, use_svydesign_weights = T, svydesign = s, id = i)

finnsurveytext documentation built on April 4, 2025, 5:07 a.m.