reinert: Segment clustering based on the Reinert method - Simple...

View source: R/reinert.R

reinertR Documentation

Segment clustering based on the Reinert method - Simple clustering

Description

Segment clustering based on the Reinert method - Simple clustering

Usage

reinert(
  x,
  k = 10,
  term = "token",
  segment_size = 40,
  min_segment_size = 3,
  min_split_members = 5,
  cc_test = 0.3,
  tsj = 3
)

Arguments

x

tall data frame of documents

k

maximum number of clusters to compute

term

indicates the type of form "lemma" or "token". Default value is term = "lemma".

segment_size

number of forms by document. Default value is segment_size = 40

min_segment_size

minimum number of forms by document. Default value is min_segment_size = 5

min_split_members

minimum number of segment in a cluster

cc_test

contingency coefficient value for feature selection

tsj

minimum frequency value for feature selection

Details

See the references for original articles on the method. Special thanks to the authors of the rainette package (https://github.com/juba/rainette) for inspiring the coding approach used in this function.

Value

The result is a list of both class hclust and reinert_tall.

References

  • Reinert M, Une méthode de classification descendante hiérarchique: application à l'analyse lexicale par contexte, Cahiers de l'analyse des données, Volume 8, Numéro 2, 1983. https://www.numdam.org/item/?id=CAD_1983__8_2_187_0

  • Reinert M., Alceste une méthodologie d'analyse des données textuelles et une application: Aurelia De Gerard De Nerval, Bulletin de Méthodologie Sociologique, Volume 26, Numéro 1, 1990. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/075910639002600103")}

  • Barnier J., Privé F., rainette: The Reinert Method for Textual Data Clustering, 2023, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.32614/CRAN.package.rainette")}

Examples


data(mobydick)
res <- reinert(
  x = mobydick,
  k = 10,
  term = "token",
  segment_size = 40,
  min_segment_size = 5,
  min_split_members = 10,
  cc_test = 0.3,
  tsj = 3
)



tall documentation built on April 16, 2025, 5:10 p.m.