create_profiles: Create profiles of observed variables using two-step cluster...

Description Usage Arguments Details Value Examples

Description

Create profiles of observed variables using two-step cluster analysis

Usage

1
2
3
create_profiles(df, ..., n_profiles, to_center = FALSE, to_scale = FALSE,
  distance_metric = "squared_euclidean", linkage = "complete",
  plot_centered_data = FALSE, plot_raw_data = FALSE)

Arguments

df

with two or more columns with continuous variables

...

unquoted variable names separated by commas

n_profiles

The specified number of profiles to be found for the clustering solution

to_center

Boolean (TRUE or FALSE) for whether to center the raw data with M = 0

to_scale

Boolean (TRUE or FALSE) for whether to scale the raw data with SD = 1

distance_metric

Distance metric to use for hierarchical clustering; "squared_euclidean" is default but more options are available (see ?hclust)

linkage

Linkage method to use for hierarchical clustering; "complete" is default but more options are available (see ?dist)

plot_centered_data

Boolean (TRUE or FALSE) for whether to center the data before plotting (should not be used if to_center = T; only if to_center = F, in cases in which raw data is used to create profiles but centered profiles are desired for visualization purposes)

plot_raw_data

Boolean (TRUE or FALSE) for whether to plot the raw data, regardless of whether the data are centered or scaled before clustering.

Details

Function to create a specified number of profiles of observed variables using a two-step (hierarchical and k-means) cluster analysis.

Value

A list containing the prepared data, the output from the hierarchical and k-means cluster analysis, the r-squared value, raw clustered data, processed clustered data of cluster centroids, and a ggplot object.

Examples

1
create_profiles(mtcars, disp, hp, wt, n_profiles = 2, to_scale = TRUE)

Example output

Prepared data: Removed 0 incomplete cases
Hierarchical clustering carried out on: 32 cases
K-means algorithm converged: 1 iteration
Clustered data: Using a 2 cluster solution
Calculated statistics: R-squared = 0.654

prcr documentation built on May 2, 2019, 4:01 p.m.