MEM_RMSD: MEM RMSD similarity between populations

View source: R/MEM_RMSD.R

MEM_RMSDR Documentation

MEM RMSD similarity between populations

Description

MEM_RMSD calculates a normalized average RMSD score pairwise between populations given their MEM scores as input. This is meant to serve as a metric of similarity between populations.

The function calculates the sum of squares for all shared markers between two populations, then takes the square root of the average.

For "a" through n markers, the sum of squares is calculated as: sum of squares = (a2-a1)^2 + (b2-b1)^2 ...(n2-n1)^2

Root-mean-square deviation (RMSD) is calculated as: RMSD = sqrt(sum of squares/number of markers)

The RMSD values are then converted to percentages with the maximum RMSD in the matrix set as 100 percent, so that the final RMSD score is the percent of the maximum RMSD.

Percent_max_RMSD = 100-RMSD/max_RMSD*100

The function then outputs a clustered heatmap of Percent_max_RMSD values and the matrix of numerical values used to build the heatmap.

Usage

MEM_RMSD(
  MEM_matrix,
  format=NULL,
  output.matrix=FALSE)

Arguments

MEM_matrix

The input to MEM_RMSD can be either 1) a matrix of values, where populations are in rows and their MEM scores are in columns, 2) the list of matrices output by MEM, or 3) a file path pointing to a folder containing tab-delimited text files, one file for each population, where each file lists marker names in the first column and the corresponding MEM scores in the second column.

format

Default is NULL. When format is equal to "pop files", the function expects a file path as input where the designated folder contains one file for each population's set of MEM scores.

output.matrix

If TRUE, the RMSD heatmap in PDF format and txt file with matrix of values calculated by the function will be output and located in a folder called "output files" that is generated in the working directory.

Details

If you are calculating MEM_RMSD on population files, populations do not have to include all of the same markers. The function will determine which markers each pair of populations has in common and will use those common markers to calculate RMSD. If the populations have no markers in common, the function will terminate with an error. Note that population names must match exactly between files in order for them to be considered the same.

Value

RMSD_vals

Matrix of the calculated pairwise percent max RMSD scores

RMSD heatmap

Hierarchically clustered heatmap of RMSD_vals

Author(s)

Kirsten Diggins, Sierra Lima, Jonathan Irish

References

Diggins et al., Nature Methods, 2017

Examples

## For single matrix, input data directly into RMSD function

## User inputs
data(MEM_matrix)

MEM_RMSD(
  MEM_matrix,
  format=NULL,
  output.matrix=FALSE)

cytolab/cytoMEM documentation built on Sept. 13, 2023, 7:28 a.m.