Description Usage Arguments Details Value Examples
Given a data frame with texts, documents (or features) clustering is returned
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
df |
a data frame with at least a column with textual data and a column with documents' ID |
docid_field |
name of the column (in quotation marks) containing the IDs of the documents (default NULL) |
text_field |
name of the column (in quotation marks) containing textual data |
min_docfreq |
minimum values of a feature's document frequency, below which features will be removed (default 0.5 percentile) |
max_docfreq |
maximum values of a feature's document frequency, above which features will be removed (default 99 percentile) |
tfidf |
term frequency inverse document frequency weighting (default TRUE) |
element |
elements to cluster. Available options are "documents" and "features" (default "documents") |
k |
desired number of clusters (default NULL). If NULL, the silhouette method is used to estimate the appropriate number of clusters |
k.max |
max number of cluster if k is not specified (default NULL) |
nstart |
number of initial configurations (default 25) |
method |
clustering method. Available options are "kmeans" and "hclust" (default "kmeans) |
hc_method |
the agglomeration method to be used in case of "hclust" method. This should be one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). See hclust |
return_fit |
return the fitted cluster_model in the main environment |
the function is substantially a wrapper of functions available in quanteda and factoextra. Please refer to the available documentations of textstat_simil and eclust. The fitted cluster_model can be used to create kmeans plots with fviz_cluster or dendrogram (hclust) wiht fviz_dend
an vector of cluster IDs. Silhouette information with clusters' size and average silhouette width are printed in console
1 2 3 | ## Not run:
df$cluster <- clusterizer(df, docid_field = "documents", text_field = "texts", k = 10)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.