olink_pca_plot | R Documentation |
Generates a PCA projection of all samples from NPX data along two principal components (default PC2 vs. PC1) including the explained variance and dots colored by QC_Warning using stats::prcomp and ggplot2::ggplot.
olink_pca_plot(
df,
color_g = "QC_Warning",
x_val = 1,
y_val = 2,
label_samples = FALSE,
drop_assays = FALSE,
drop_samples = FALSE,
n_loadings = 0,
loadings_list = NULL,
byPanel = FALSE,
outlierDefX = NA,
outlierDefY = NA,
outlierLines = FALSE,
label_outliers = TRUE,
quiet = FALSE,
verbose = TRUE,
...
)
df |
data frame in long format with Sample Id, NPX and column of choice for colors |
color_g |
Character value indicating which column to use for colors (default QC_Warning) |
x_val |
Integer indicating which principal component to plot along the x-axis (default 1) |
y_val |
Integer indicating which principal component to plot along the y-axis (default 2) |
label_samples |
Logical. If TRUE, points are replaced with SampleID (default FALSE) |
drop_assays |
Logical. All assays with any missing values will be dropped. Takes precedence over sample drop. |
drop_samples |
Logical. All samples with any missing values will be dropped. |
n_loadings |
Integer. Will plot the top n_loadings based on size. |
loadings_list |
Character vector indicating for which OlinkID's to plot as loadings. It is possible to use n_loadings and loadings_list simultaneously. |
byPanel |
Perform the PCA per panel (default FALSE) |
outlierDefX |
The number standard deviations along the PC plotted on the x-axis that defines an outlier. See also 'Details" |
outlierDefY |
The number standard deviations along the PC plotted on the y-axis that defines an outlier. See also 'Details" |
outlierLines |
Draw dashed lines at +/- outlierDefX and outlierDefY standard deviations from the mean of the plotted PCs (default FALSE) |
label_outliers |
Use ggrepel to label samples lying outside the limits set by the outlierLines (default TRUE) |
quiet |
Logical. If TRUE, the resulting plot is not printed |
verbose |
Logical. Whether warnings about the number of samples and/or assays dropped or imputed should be printed to the console. |
... |
coloroption passed to specify color order. |
The values are by default scaled and centered in the PCA and proteins with missing NPX values are by default removed from the corresponding assay.
Unique sample names are required.
Imputation by the median is done for assays with missingness <10\
The plot is printed, and a list of ggplot objects is returned.
If byPanel = TRUE, the data processing (imputation of missing values etc) and subsequent PCA is performed separately per panel. A faceted plot is printed, while the individual ggplot objects are returned.
The arguments outlierDefX and outlierDefY can be used to identify outliers in the PCA. Samples more than +/- outlierDefX and outlierDefY standard deviations from the mean of the plotted PC will be labelled. Both arguments have to be specified.
A list of objects of class "ggplot", each plot contains scatter plot of PCs
library(dplyr)
npx_data <- npx_data1 %>%
filter(!grepl('CONTROL', SampleID))
#PCA using all the data
olink_pca_plot(df=npx_data, color_g = "QC_Warning")
#PCA per panel
g <- olink_pca_plot(df=npx_data, color_g = "QC_Warning", byPanel = TRUE)
g[[2]] #Plot only the second panel
#Label outliers
olink_pca_plot(df=npx_data, color_g = "QC_Warning",
outlierDefX = 2, outlierDefY = 4) #All data
olink_pca_plot(df=npx_data, color_g = "QC_Warning",
outlierDefX = 2.5, outlierDefY = 4, byPanel = TRUE) #Per panel
#Retrieve the outliers
g <- olink_pca_plot(df=npx_data, color_g = "QC_Warning",
outlierDefX = 2.5, outlierDefY = 4, byPanel = TRUE)
outliers <- lapply(g, function(x){x$data}) %>%
bind_rows() %>%
filter(Outlier == 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.