component_similarity: Extracts different component/factor similarity (matching)...

View source: R/utility_functions.R

component_similarityR Documentation

Extracts different component/factor similarity (matching) indices

Description

Given a list of loadings for a set of factors or components, computes the Pearson's coefficient of determination (r), the coefficient of congruence (CC), Cattell's S-statistic, and the root mean square error (RMSE) between them.

Usage

component_similarity(
  load.list,
  s_cut_off = 0.4,
  ndim = 5,
  similarity_metric = "all"
)

Arguments

load.list

List of factors to match. Each element of the list is a matrix p x m where p are the variables and m the factors or components. All matrices must have variables and components in the same order.

s_cut_off

Numerical value for the loading cut off used to determine if a variable is silent or not in Cattell's terms.

ndim

Numeric. Number of PCs to compute the similarity from. Default=5

similarity_metric

Character or character vector. Possible values are "cc_index" (congruence coefficient), "r_correlation" (Pearson's r), "rmse" (root mean squared error), "s_index' (Cattell's s metric), or "all". Default="all". See below for details on calculations.

Details

This function is internally called by pc_stability(). Each metric is computed using an external function:

"cc_index"(extract_cc()) function

The congurence coefficient is calculated as:

CC_{x,y} = sum(x_{i} X y_{i}) / sqrt(sum(x_{i}^2) X sum(y{i})^2)

Where x_{i} and y_{i} are the loadings of the variable i on the component or factor x and y respectively. CC is equivalent to the cosine of the angle between two vectors (the cosine similarity metric) and has a numerical range from -1 to 1. The sign of a component is arbitrary and can be flipped without affecting its interpretation. Here we consider the absolute value of CC (0 to 1). The closer the CC is to 1, the more similar the two components are. (see refs 1,2)

"r_correlation"(cor()) function

The Pearson's r between two vectors of component loadings has also been used as a similarity metric for component/factor matching(ref 3). We calculate it here using the cor() function.

"rmse"(extract_rmse()) function

RMSE has been also used as a metric for factor matching (see ref 3). It is calculated as:

RMSE_{x,y} = sqrt( sum((x_{i}-y_{i})^2) / n)

Where n is the number of variables in both components x and y. A RMSE of 0 corresponds to a perfect match. The smaller the RMSE is, the more equivalent two components are.

"s_index"(extract_s()) function

The s index was first suggested by Cattell et al. It is based on the factor mandate matrix (ref 4) where loadings are either 1 if a component is considered to act on a variable, called a salient variable, or 0 if not (forming the hyperplane space). Cattell’s suggested an arbitrary ±0.1 cut-off to be considered as salient variables. In practice, one might want to alter the threshold depending on the experimental conditions.

Value

Returns a list of three objects. Index_all contains all the comparisons between all the elements of the load.list. In general, similarity is calculated between two matrices of loadings, but the user can extract the all the comparisons in case length (load.list) is > 2. index_mean is the average of the similarity metrics between all the comparisons. It will be the same as the individual metric (index_all) when length(load.list)==2, because there is only a single comparison made in that scenario. index_sd is the standard deviation of the index in case length (load.list) is > 2.

Author(s)

Abel Torres Espin

References

  1. Burt C. The Factorial Study of Temperamental Traits. Br J Stat Psychol. 1948;1(3):178–203.

  2. Tucker, L. R. A method for synthesis of factor analysis studies. Personnel Research Section Report No.984. Washington D.C.: Department of the Army.; 1951.

  3. Guadagnoli E, Velicer W. A Comparison of Pattern Matching Indices. Multivar Behav Res. 1991 Apr;26(2):323–43

  4. Cattell RB, Balcar KR, Horn JL, Nesselroade JR. Factor Matching Procedures: an Improvement of the s Index; with Tables. Educ Psychol Meas. 1969 Dec;29(4):781–92

Examples

data(mtcars)
pca_mtcars_1<-prcomp(mtcars, center = TRUE, scale = TRUE)

#Second pca with a subsetted mtcars as an example of comparing loading patterns
#from two proximal datasets
pca_mtcars_2<-prcomp(mtcars[1:20,], center = TRUE, scale = TRUE)

s.loadings_1<-stand_loadings(pca = pca_mtcars_1, pca_data = mtcars)
s.loadings_2<-stand_loadings(pca = pca_mtcars_2, pca_data = mtcars[1:20,])

component_similarity(load.list = list(s.loadings_1, s.loadings_2))


ucsf-ferguson-lab/syndRomics documentation built on June 26, 2022, 5:36 p.m.