eda: A rough function to perform exploratory data analysis on a...

View source: R/eda.R

edaR Documentation

A rough function to perform exploratory data analysis on a given matrix. Returns correlation matrix and PCA summary figures.

Description

eda is a (very) rough function to perform exploratory data analysis on a given matrix. Returns correlation matrix and PCA summary figures. Note: must have ggfortify package installed and imported before calling this function.

Usage

eda(
  mat,
  cor_method = c("pairwise.complete.obs", "pearson"),
  cor_lbl = FALSE,
  impute = 0,
  scale_flag = TRUE,
  pcs = c("PC1", "PC2", "PC3", "PC4", "PC5", "PC6"),
  colgroups = NULL,
  rowgroups = NULL,
  rowgroups_name = "cohort",
  frame = F
)

Arguments

mat

The matrix to perform the eda on.

cor_method

Arguments to be passed to GGally::ggcorr's "method" argument. By default, cor_method = c("pairwise.complete.obs", "pearson").

cor_lbl

A logical indicating if there should be labels on the correlation matrix. By default, cor_lbl = FALSE.

impute

What to impute missing values with. By default, impute = 0.

scale_flag

A logical indicating if the matrix should be scaled before running PCA. By default, scale_flag = TRUE.

pcs

A character vector detailing which components you would like to get the loadings plots for. By default, pcs = c("PC1", "PC2", "PC3", "PC4", "PC5", "PC6").

colgroups

An optional dataframe with the column names for mat in the first column, and their grouping in the second column. By default, colgroups = NULL.

rowgroups

An optional vector identifying the groups the rows of mat belong to. Its length should be = nrow(mat). By default, rowgroups = NULL.

rowgroups_name

A character providing the overall name for the rowgroups vector.

frame

An optional logical specifying if you would like frames over the clusters in a PCA biplot. By default, frame = FALSE. Only worth passing as TRUE when rowgroups is not NULL.

Value

A list containing the following:

cor

A correlation matrix figure. From GGally::ggcorr.

var

A summary table/figure describing PCA's principle components and their respective proportion of variance explained. From the kableExtra package.

var_raw

The raw data underlying the var figure.

load

A figure of the loadings for the first pcs-many components PCA has to offer. With the default value of pcs, you get the loadings for the first six components.

biplot1

A PCA biplot figure of PC1 plotted against PC2.

biplot2

A PCA biplot figure of PC1 plotted against PC3. NULL if there are not at least 3 PCs returned by PCA.

biplot3

A PCA biplot figure of PC2 plotted against PC3. NULL if there are not at least 3 PCs returned by PCA.


Columbia-PRIME/PCPhelpers documentation built on April 24, 2022, 7:57 p.m.