calc_PCcors: Calculate correlations of principal components with quality...

View source: R/calc_PCcors.R

calc_PCcorsR Documentation

Calculate correlations of principal components with quality or annotation variables

Description

This is a helper function to facilitate calculating correlations of sample-specific variables with principal components (generally from RNAseq data). By default, it uses Spearman (rank) correlations for continuous variables and intraclass correlations (as implemented in ICC::ICCbare) for categorical variables. The correlation method can be changed for continuous variables, but not currently for categorical variables.

Usage

calc_PCcors(
  PCA_result,
  annotation,
  PCs = 1:10,
  id_col = "libid",
  var_cols,
  ignore_unique_nonnumeric = TRUE,
  ignore_invariant = TRUE,
  date_as_numeric = TRUE,
  min_libs = 5,
  cont_method = "spearman",
  cat_method = "ICC",
  ...
)

Arguments

PCA_result

result of a principal component analysis, generally of gene expression data. Typically the output of prcomp or calc_PCAs. Can also be a matrix with samples in rows and dimensions in columns.

annotation

a data frame containing annotation data for the samples. May include clinical data, sample quality metrics, etc.

PCs

numeric vector of principal component axes to include in correlation calculations. Defaults to 1:10, which will calculate for the first 10 PCs. Any PCs specified that are not found in the PCA object will not be compared.

id_col

name or number of the column of annotation containing the library identifiers. These are matched to the rownames of PCA_result. Ignored if PCA_result does not have rownames.

var_cols

numbers or names of columns to include in the correlation calculations. If not specified, all columns will be included, subject to other exclusion criteria.

ignore_unique_nonnumeric

logical, whether to drop columns from annotation if they contain unique non-numeric values. Correlations for such variables are meaningless. Defaults to TRUE.

ignore_invariant

logical, whether to drop columns from annotation if all non-NA values are identical. Correlations for such variables are meaningless. Defaults to TRUE.

date_as_numeric

logical, whether to treat data of class "Date" and "POSIXt" as numeric. If set to FALSE, dates are treated as categorical variables.

min_libs

number, the minimum number of libraries containing non-NA values for a variable. Variables in annotation with fewer non-NA values will be dropped. Defaults to 5.

cont_method

character, the name of the correlation coefficient to use for continuous variables. Passed to stats::cor, and must be one of "pearson", "kendall", or "spearman", or abbreviations thereof. Defaults to "spearman".

cat_method

character, the name of the correlation coefficient to use for categorical variables. Currently, the only acceptable option is "ICC", which uses the intraclass correlation coefficient as implemented in ICC::ICCbare.

...

(optional) additional arguments passed to cor or other functions.

Value

a matrix of correlation coefficients, wih the column and row names reflecting the PC axes and annotation variables for which correlations were calculated.


BenaroyaResearch/RNAseQC documentation built on April 19, 2024, 7:38 p.m.