clusterms: Get terms by cluster

Description Usage Arguments Details Value Examples

View source: R/clusterms.R

Description

Given a data frame with clusters, top words by cluster are returned

Usage

1
2
3
4
5
6
7
8
9
clusterms(
  df,
  cluster_field = NULL,
  docid_field = NULL,
  text_field = NULL,
  clean = FALSE,
  lang = NULL,
  n = 10
)

Arguments

df

a dataframe with at least a column with textual data, cluster's and documents' IDs

cluster_field

name of the column (in quotation marks) containing the clusters' IDs (default NULL)

docid_field

name of the column (in quotation marks) containing the documents' ID (default NULL)

text_field

name of the column (in quotation marks) containing textual data

clean

clean the text from stopwords, punctuation, symbols etc. (default FALSE)

lang

if clean=TRUE, langauge of the stopword should be specified. It supports the following languages: danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, portuguese, russian, spanish, and swedish

n

number of top words to return

Details

the most specific words of each clusters are computed through the chi-squared statistics as implemented in textstat_keyness

Value

a data frame with the most frequent and specific words of each cluster

Examples

1
2
3
## Not run: 
top_terms <- clusterm(df, cluster_field = "cluster", text_field = "texts")
## End(Not run)

nicolarighetti/textools documentation built on Oct. 16, 2021, 11:20 p.m.