calc_PCcors: Calculate correlations of principal components with quality...

Calculate correlations of principal components with quality or annotation variables


This is a helper function to facilitate calculating correlations of sample-specific variables with principal components (generally from RNAseq data). By default, it uses Spearman (rank) correlations for continuous variables and intraclass correlations (as implemented in ICC::ICCbare) for categorical variables. The correlation method can be changed for continuous variables, but not currently for categorical variables.


  PCs = 1:10,
  id_col = "libid",
  ignore_unique_nonnumeric = TRUE,
  ignore_invariant = TRUE,
  date_as_numeric = TRUE,
  min_libs = 5,
  cont_method = "spearman",
  cat_method = "ICC",



result of a principal component analysis, generally of gene expression data. Typically the output of prcomp or calc_PCAs. Can also be a matrix with samples in rows and dimensions in columns.


a data frame containing annotation data for the samples. May include clinical data, sample quality metrics, etc.


numeric vector of principal component axes to include in correlation calculations. Defaults to 1:10, which will calculate for the first 10 PCs. Any PCs specified that are not found in the PCA object will not be compared.


name or number of the column of annotation containing the library identifiers. These are matched to the rownames of PCA_result. Ignored if PCA_result does not have rownames.


numbers or names of columns to include in the correlation calculations. If not specified, all columns will be included, subject to other exclusion criteria.


logical, whether to drop columns from annotation if they contain unique non-numeric values. Correlations for such variables are meaningless. Defaults to TRUE.


logical, whether to drop columns from annotation if all non-NA values are identical. Correlations for such variables are meaningless. Defaults to TRUE.


logical, whether to treat data of class "Date" and "POSIXt" as numeric. If set to FALSE, dates are treated as categorical variables.


number, the minimum number of libraries containing non-NA values for a variable. Variables in annotation with fewer non-NA values will be dropped. Defaults to 5.


character, the name of the correlation coefficient to use for continuous variables. Passed to stats::cor, and must be one of "pearson", "kendall", or "spearman", or abbreviations thereof. Defaults to "spearman".


character, the name of the correlation coefficient to use for categorical variables. Currently, the only acceptable option is "ICC", which uses the intraclass correlation coefficient as implemented in ICC::ICCbare.


(optional) additional arguments passed to cor or other functions.


a matrix of correlation coefficients, wih the column and row names reflecting the PC axes and annotation variables for which correlations were calculated.

