reinert | R Documentation |
Segment clustering based on the Reinert method - Simple clustering
reinert(
x,
k = 10,
term = "token",
segment_size = 40,
min_segment_size = 3,
min_split_members = 5,
cc_test = 0.3,
tsj = 3
)
x |
tall data frame of documents |
k |
maximum number of clusters to compute |
term |
indicates the type of form "lemma" or "token". Default value is term = "lemma". |
segment_size |
number of forms by document. Default value is segment_size = 40 |
min_segment_size |
minimum number of forms by document. Default value is min_segment_size = 5 |
min_split_members |
minimum number of segment in a cluster |
cc_test |
contingency coefficient value for feature selection |
tsj |
minimum frequency value for feature selection |
See the references for original articles on the method. Special thanks to the authors of the rainette package (https://github.com/juba/rainette) for inspiring the coding approach used in this function.
The result is a list of both class hclust
and reinert_tall
.
Reinert M, Une méthode de classification descendante hiérarchique: application à l'analyse lexicale par contexte, Cahiers de l'analyse des données, Volume 8, Numéro 2, 1983. https://www.numdam.org/item/?id=CAD_1983__8_2_187_0
Reinert M., Alceste une méthodologie d'analyse des données textuelles et une application: Aurelia De Gerard De Nerval, Bulletin de Méthodologie Sociologique, Volume 26, Numéro 1, 1990. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/075910639002600103")}
Barnier J., Privé F., rainette: The Reinert Method for Textual Data Clustering, 2023, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.32614/CRAN.package.rainette")}
data(mobydick)
res <- reinert(
x = mobydick,
k = 10,
term = "token",
segment_size = 40,
min_segment_size = 5,
min_split_members = 10,
cc_test = 0.3,
tsj = 3
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.