olink_pca_plot: Function to plot a PCA of the data

olink_pca_plotR Documentation

Function to plot a PCA of the data

Description

Generates a PCA projection of all samples from NPX data along two principal components (default PC2 vs. PC1) including the explained variance and dots colored by QC_Warning using stats::prcomp and ggplot2::ggplot.

Usage

olink_pca_plot(
  df,
  color_g = "QC_Warning",
  x_val = 1,
  y_val = 2,
  label_samples = FALSE,
  drop_assays = FALSE,
  drop_samples = FALSE,
  n_loadings = 0,
  loadings_list = NULL,
  byPanel = FALSE,
  outlierDefX = NA,
  outlierDefY = NA,
  outlierLines = FALSE,
  label_outliers = TRUE,
  quiet = FALSE,
  verbose = TRUE,
  ...
)

Arguments

df

data frame in long format with Sample Id, NPX and column of choice for colors

color_g

Character value indicating which column to use for colors (default QC_Warning)

x_val

Integer indicating which principal component to plot along the x-axis (default 1)

y_val

Integer indicating which principal component to plot along the y-axis (default 2)

label_samples

Logical. If TRUE, points are replaced with SampleID (default FALSE)

drop_assays

Logical. All assays with any missing values will be dropped. Takes precedence over sample drop.

drop_samples

Logical. All samples with any missing values will be dropped.

n_loadings

Integer. Will plot the top n_loadings based on size.

loadings_list

Character vector indicating for which OlinkID's to plot as loadings. It is possible to use n_loadings and loadings_list simultaneously.

byPanel

Perform the PCA per panel (default FALSE)

outlierDefX

The number standard deviations along the PC plotted on the x-axis that defines an outlier. See also 'Details"

outlierDefY

The number standard deviations along the PC plotted on the y-axis that defines an outlier. See also 'Details"

outlierLines

Draw dashed lines at +/-outlierDef[X,Y] standard deviations from the mean of the plotted PCs (default FALSE)

label_outliers

Use ggrepel to label samples lying outside the limits set by the outlierLines (default TRUE)

quiet

Logical. If TRUE, the resulting plot is not printed

verbose

Logical. Whether warnings about the number of samples and/or assays dropped or imputed should be printed to the console.

...

coloroption passed to specify color order.

Details

The values are by default scaled and centered in the PCA and proteins with missing NPX values are by default removed from the corresponding assay. Unique sample names are required. Imputation by the median is done for assays with missingness <10% for multi-plate projects and <5% for single plate projects. The plot is printed, and a list of ggplot objects is returned.

If byPanel = TRUE, the data processing (imputation of missing values etc) and subsequent PCA is performed separately per panel. A faceted plot is printed, while the individual ggplot objects are returned.

The arguments outlierDefX and outlierDefY can be used to identify outliers in the PCA. Samples more than +/-outlierDef[X,Y] standard deviations from the mean of the plotted PC will be labelled. Both arguments have to be specified.

Value

A list of objects of class "ggplot", each plot contains scatter plot of PCs

Examples


library(dplyr)
npx_data <- npx_data1 %>%
    filter(!grepl('CONTROL', SampleID))

#PCA using all the data
olink_pca_plot(df=npx_data, color_g = "QC_Warning")

#PCA per panel
g <- olink_pca_plot(df=npx_data, color_g = "QC_Warning", byPanel = TRUE)
g[[2]] #Plot only the second panel

#Label outliers
olink_pca_plot(df=npx_data, color_g = "QC_Warning",
               outlierDefX = 2, outlierDefY = 4) #All data
olink_pca_plot(df=npx_data, color_g = "QC_Warning",
               outlierDefX = 2.5, outlierDefY = 4, byPanel = TRUE) #Per panel

#Retrieve the outliers
g <- olink_pca_plot(df=npx_data, color_g = "QC_Warning",
                    outlierDefX = 2.5, outlierDefY = 4, byPanel = TRUE)
outliers <- lapply(g, function(x){x$data}) %>%
    bind_rows() %>%
    filter(Outlier == 1)


OlinkAnalyze documentation built on Nov. 4, 2023, 1:07 a.m.