Olink® Analyze Vignette"

knitr::opts_chunk$set(
  fig.width = 6,
  fig.height = 3,
  fig.align = "center",
  collapse = TRUE,
  comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)

Olink® Analyze is an R package that provides a versatile toolbox to enable fast and easy handling of Olink® NPX data for your proteomics research. Olink® Analyze provides functions for using Olink data, including functions for importing Olink® NPX datasets exported from the NPX Manager, as well as quality control (QC) plot functions and functions for various statistical tests. This package is meant to provide a convenient pipeline for your Olink NPX data analysis.

Installation

You can install Olink® Analyze from CRAN.

install.packages("OlinkAnalyze")

List of functions

Preprocessing

Statistical analysis

Visualization

Sample datasets

Usage

Load the library

# Load OlinkAnalyze
library(OlinkAnalyze)

# Load other libraries used in Vignette
library(dplyr)
library(ggplot2)
library(stringr)

Introduction to Olink NPX data format

The package contains two test data files named npx_data1 and npx_data2. These are synthetic datasets that resemble Olink® data accompanied by clinical variables. Olink® data in that are delivered in long format or imported with the function read_NPX (that converts the data into a long format) contain the following columns:

Note: There are 5 additional variables in the sample datasets npx_data1 and npx_data2 that include clinical or other information, namely: Subject \<chr>, Treatment \<chr>, Site \<chr>, Time \<chr>, Project \<chr>.

Preprocessing

Read NPX data (read_NPX)

The read_NPX function imports an NPX file of wide format that has been exported from Olink® NPX Manager and converts the data into the (preferred by R) long format. The wide format is the most common way Olink® delivers data for Olink® Target 96, however, for data analysis a long format is preferred. No prior alterations to the output of the NPX Manager should be made for this function to work as expected.

Function arguments

data <- read_NPX("~/NPX_file_location.xlsx")

Function output

A tibble in long format containing:

Randomize samples on plate (olink_plate_randomizer)

The olink_plate_randomizer function randomly assigns samples to a plate well with the option to keep the same individuals on the same plate. Olink® does not recommend to force balance based on other clinical variables.

Function arguments

olink_plate_randomizer(manifest, 
                       SubjectColumn ="SampleID",
                       seed=111)

Function output

A tibble including SampleID, SubjectID etc. assigned to well positions.

Select bridge samples (olink_bridgeselector)

The bridge selection function selects a number of bridge samples based on the input data. Bridge samples are used to normalize two dataframes/projects that have been ran at different time points, hence, a batch effect is expected. It select samples that fulfill certain criteria that include good detectability, passing quality control and covering a wide range of data points. When possible the function recommends 8-16 bridge samples.

Bridge sample selection strategy: Start by choosing samples with at most 10% missingness (sampleMissingFreq = 0.1), and in case there are not enough samples to output, increase the threshold to 20% (sampleMissingFreq = 0.2).

Function arguments

# Select overlapping samples
olink_bridgeselector(df = npx_data1, 
                     sampleMissingFreq = 0.1,
                     n = 8)

Function output

Tibble with sample IDs and mean NPX for the pre-defined number of bridging samples.

Normalizing NPX data (olink_normalization)

The olink_normalization is a function used to normalize NPX values between two different dataframes/projects which have been ran at different times. Commonly, there is a shift in (mean) NPX values between runs, however, the spread of the data remains the same. This is why normalization between dataframes/projects is required. When normalization is performed, one of the two provided dataframes/projects shall be used as a reference. If two dataframes/projects have been normalized to one another, Olink® by default uses the chronologically older one as reference. The function handles three different types of normalization:

Function arguments

# Find overlapping samples
overlap_samples <- intersect(npx_data1$SampleID, npx_data2$SampleID) %>% 
  data.frame() %>% 
  filter(!str_detect(., 'CONTROL_SAMPLE')) %>% #Remove control samples
  pull(.)
# Perform Bridging normalization
olink_normalization(df1 = npx_data1, 
                    df2 = npx_data2, 
                    overlapping_samples_df1 = overlap_samples,
                    df1_project_nr = '20200001',
                    df2_project_nr = '20200002',
                    reference_project = '20200001')

# Example of using all samples for normalization
subset_df1 <- npx_data1 %>% 
  filter(QC_Warning == 'Pass') %>% 
  filter(!str_detect(SampleID, 'CONTROL_SAMPLE')) %>%
  pull(SampleID) %>% 
  unique()

subset_df2 <- npx_data2 %>% 
  filter(QC_Warning == 'Pass') %>% 
  filter(!str_detect(SampleID, 'CONTROL_SAMPLE')) %>%
  pull(SampleID) %>% 
  unique()

olink_normalization(df1 = npx_data1, 
                    df2 = npx_data2, 
                    overlapping_samples_df1 = subset_df1,
                    overlapping_samples_df2 = subset_df2,
                    df1_project_nr = '20200001',
                    df2_project_nr = '20200002',
                    reference_project = '20200001')

Function output

A tibble of NPX data in long format containing normalized NPX values, including adjustment factors:

Statistical analysis

T-test analysis (olink_ttest)

The olink_ttest function performs a Welch 2-sample t-test or paired t-test at confidence level 0.95 for every protein (by OlinkID) for a given grouping variable using the function t.test from the R library stats and corrects for multiple testing using the Benjamini-Hochberg method (“fdr”) using the function p.adjust from the R library stats. Adjusted p-values are logically evaluated towards adjusted p-value<0.05. The resulting t-test table is arranged by ascending p-values.

Function arguments

olink_ttest(df = npx_data1,
            variable = 'Treatment')

Function output

A tibble with the following columns:

Mann-Whitney U Test analysis (olink_wilcox)

The olink_wilcox function performs a welch 2-sample Mann-Whitney U test or paired Mann-Whitney U test at confidence level 0.95 for every protein (by OlinkID) for a given grouping variable using the function wilcox.test from the R library stats and corrects for multiple testing using the Benjamini-Hochberg method (“fdr”) based on the function p.adjust from the R library stats. Adjusted p-values are logically evaluated towards adjusted p-value<0.05. The resulting Mann-Whitney U table is arranged by ascending p-values.

Function arguments

olink_wilcox(df = npx_data1,
             variable = 'Treatment')

Function output

A tibble with the following columns:

Analysis for variance (ANOVA) (olink_anova)

The olink_anova is a wrapper function that performs an ANOVA F-test for each assay using the function Anova from the R library car and Type III sum of squares. The function handles both factor and numerical variables, and/or confounding factors.

Samples with missing variable information or factor levels are excluded from the analysis. Character columns in the input data frame are converted to factors. The automatic handling of the data from above is announced by a message if the flag verbose=TRUE.

Crossed/interaction analysis, i.e. A*B formula notation, is inferred from the variable argument in the following cases:

Inference is specified in a message if verbose=TRUE.

For covariates, crossed analyses need to be specified explicitly, i.e. two main effects will not be expanded with a c('A','B') notation. Main effects present in the variable take precedence. The formula notation of the final model is specified in a message if verbose=TRUE.

Adjusted p-values are calculated using the function p.adjust from the R library stats with the Benjamini & Hochberg (1995) method (“fdr”). The threshold is determined by logic evaluation of Adjusted_pval < 0.05. Covariates are not included in the p-value adjustment.

Function arguments

# One-way ANOVA, no covariates
anova_results_oneway <- olink_anova(df = npx_data1, 
                                    variable = 'Site')
# Two-way ANOVA, no covariates
anova_results_twoway <- olink_anova(df = npx_data1, 
                                    variable = c('Site', 'Time'))
# One-way ANOVA, Treatment as covariates
anova_results_oneway <- olink_anova(df = npx_data1, 
                                    variable = 'Site',
                                    covariates = 'Treatment')

Function output

A tibble with the following columns:

Post-hoc ANOVA analysis (olink_anova_posthoc)

olink_anova_posthoc performs a post-hoc ANOVA test using the function emmeans from the R library emmeans with Tukey p-value adjustment per assay (by OlinkID) at confidence level 0.95.

The function handles both factor and numerical variables and/or covariates. The post-hoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation (SD) difference in the numerical variable, e.g. mean NPX at mean (numerical variable) versus mean NPX at mean (numerical variable) + 1*SD (numerical variable).

Function arguments

# calculate the p-value for the ANOVA
anova_results_oneway <- olink_anova(df = npx_data1, 
                                    variable = 'Site')
# extracting the significant proteins
anova_results_oneway_significant <- anova_results_oneway %>%
  filter(Threshold == 'Significant') %>%
  pull(OlinkID)
anova_posthoc_oneway_results <- olink_anova_posthoc(df = npx_data1,
                                                    olinkid_list = anova_results_oneway_significant,
                                                    variable = 'Site',
                                                    effect = 'Site')

Function output

A tibble with the following columns:

One way non-parametric test (olink_one_non_parametric)

The olink_one_non_parametric is a wrapper function that performs either a Kruskal-Wallis test or a Friedman test for each assay using the function kruskal.test from the R library stats or the function friedman_test from the R library rstatix. The posthoc test for Friedman test is performed using the function wilcox_test from the R library rstatix, whereas for Kruskal-Wallis test the function dunnTest from the R library FSA is applied. The function handles both factor and numerical variables, and/or confounding factors.

Samples with missing variable information or factor levels are excluded from the analysis. Character columns in the input data frame are converted to factors. The automatic handling of the data from above is announced by a message if the flag verbose=TRUE.

Adjusted p-values are calculated based on the Benjamini & Hochberg (1995) method (“fdr”). The threshold is determined by logic evaluation of Adjusted_pval < 0.05.

Function arguments

# One-way Kruskal-Wallis Test
kruskal_results <- olink_one_non_parametric(df = npx_df, 
                                            variable = "Time")
# One-way Friedman Test
friedman_results <- olink_one_non_parametric(df = npx_df, 
                                             variable = "Time", 
                                             subject = "Subject",
                                             dependence = TRUE)

Function output

A tibble with the following columns:

Post-hoc one way non-parametric analysis (olink_one_non_parametric_posthoc)

olink_one_non_parametric_posthoc performs a post-hoc Wilcoxon test using the function wilcox_test from the R library rstatix with Benjamini & Hochberg p-value adjustment per assay (by OlinkID) at confidence level 0.95. The function handles both factor and numerical variables and/or covariates.

Function arguments

#Friedman Test
Friedman_results <- olink_one_non_parametric(df = npx_data1, 
                                             variable = "Time", 
                                             subject = "Subject",
                                             dependence = TRUE)

#Filtering out significant and relevant results.
significant_assays <- Friedman_results %>%
  filter(Threshold == 'Significant') %>%
  dplyr::select(OlinkID) %>%
  distinct() %>%
  pull()

#Posthoc test for the results from Friedman Test
friedman_posthoc_results <- olink_one_non_parametric_posthoc(npx_data1, 
                                                             variable = "Time", 
                                                             test = "friedman",
                                                             olinkid_list = significant_assays)

Function output

A tibble with the following columns:

Regression models for ordinal data (olink_ordinalRegression)

The olink_ordinalRegression is a wrapper function that performs an ANOVA F-test for each assay (ordinal transformed) using the function Anova from the R library car and Type II sum of squares. The function handles both factor and numerical variables, and/or confounding factors.

Samples with missing variable information or factor levels are excluded from the analysis. Character columns in the input data frame are converted to factors. The automatic handling of the data from above is announced by a message if the flag verbose=TRUE.

Crossed/interaction analysis, i.e. A*B formula notation, is inferred from the variable argument in the following cases:

Inference is specified in a message if verbose=TRUE.

For covariates, crossed analyses need to be specified explicitly, i.e. two main effects will not be expanded with a c('A','B') notation. Main effects present in the variable take precedence. The formula notation of the final model is specified in a message if verbose=TRUE.

Adjusted p-values are calculated using the function p.adjust from the R library stats with the Benjamini & Hochberg (1995) method (“fdr”). The threshold is determined by logic evaluation of Adjusted_pval < 0.05. Covariates are not included in the p-value adjustment.

Function arguments

# Two-way ordinal regression, no covariates
ordinalRegression_results_twoway <- olink_ordinalRegression(df = npx_data1, 
                                                            variable = c('Site', 'Time'))
# One-way ordinal regression, Treatment as covariates
ordinalRegression_oneway <- olink_ordinalRegression(df = npx_data1, 
                                        variable = 'Site',
                                        covariates = 'Treatment')

Function output

A tibble with the following columns:

Post-hoc of regression models for ordinal data analysis (olink_ordinalRegression_posthoc)

olink_ordinalRegression_posthoc performs a post-hoc ANOVA test using the function emmeans from the R library emmeans with Tukey p-value adjustment per assay (by OlinkID) at confidence level 0.95. The function handles both factor and numerical variables and/or covariates.

Function arguments

# Two-way Ordinal Regression
ordinalRegression_results <- olink_ordinalRegression(df = npx_data1,
                             variable="Treatment:Time")
# extracting the significant proteins
significant_assays <- ordinalRegression_results %>% 
  filter(Threshold == 'Significant' & term == 'Treatment:Time') %>%
  select(OlinkID) %>%
  distinct() %>%
  pull()
# Posthoc test for the model NPX~Treatment*Time,
ordinalRegression_posthoc_results <- olink_ordinalRegression_posthoc(npx_data1, 
                                                                     variable=c("Treatment:Time"),
                                                                     covariates="Site",
                                                                     olinkid_list = significant_assays,
                                                                     effect = "Treatment:Time")

Function output

A tibble with the following columns:

Linear mixed effects model analysis (olink_lmer)

The olink_lmer fits a linear mixed effects model for every protein (by OlinkID) in every panel, using the function lmer from the R library lmerTest and the function anova from the R library stats. The function handles both factor and numerical variables and/or covariates.

Samples with missing variable information or factor levels are excluded from the analysis. Character columns in the input data frame are converted to factors. The automatic handling of the data from above is announced by a message if the flag verbose=TRUE.

Crossed/interaction analysis, i.e. A*B formula notation, is inferred from the variable argument in the following cases:

Inference is specified in a message if verbose=TRUE.

For covariates, crossed analyses need to be specified explicitly, i.e. two main effects will not be expanded with a c('A','B') notation. Main effects present in the variable take precedence. The formula notation of the final model is specified in a message if verbose=TRUE.

Adjusted p-values are calculated using the function p.adjust from the R library stats with the Benjamini & Hochberg (1995) method (“fdr”). The threshold is determined by logic evaluation of Adjusted_pval < 0.05. Covariates are not included in the p-value adjustment.

Function arguments

# Linear mixed model with one variable.
lmer_results_oneway <- olink_lmer(df = npx_data1, 
                                  variable = 'Site',
                                  random = 'Subject')
# Linear mixed model with two variables.
lmer_results_twoway <- olink_lmer(df = npx_data1, 
                                  variable = c('Site', 'Treatment'),
                                  random = 'Subject')

Function outcome

A tibble with the following columns:

Post-hoc linear mixed effects model analysis (olink_lmer_posthoc)

The olink_lmer_posthoc function is similar to olink_lmer but performs a post-hoc analysis based on a linear mixed model effects model using the function lmer from the R library lmerTest and the function emmeans from the R library emmeans. The function handles both factor and numerical variables and/or covariates. Differences in estimated marginal means are calculated for all pairwise levels of a given output variable. Degrees of freedom are estimated using Satterthwaite’s approximation. The post-hoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation difference in the numerical variable, e.g. mean NPX at mean(numerical variable) versus mean NPX at mean(numerical variable) + 1*SD(numerical variable). The output tibble is arranged by ascending adjusted p-values.

Function arguments

# Linear mixed model with two variables.
lmer_results_twoway <- olink_lmer(df = npx_data1, 
                                  variable = c('Site', 'Treatment'),
                                  random = 'Subject')
# extracting the significant proteins
lmer_results_twoway_significant <- lmer_results_twoway %>%
  filter(Threshold == 'Significant', term == 'Treatment') %>%
  pull(OlinkID)
# performing post-hoc analysis
lmer_posthoc_twoway_results <- olink_lmer_posthoc(df = npx_data1,
                                                  olinkid_list = lmer_results_twoway_significant,
                                                  variable = c('Site', 'Treatment'),
                                                  random = 'Subject',
                                                  effect = 'Treatment') 

Function output

A tibble with the following columns:

Pathway Enrichment (olink_pathway_enrichment)

The olink_pathway_enrichment function can be used to perform Gene Set Enrichment Analysis (GSEA) or Over-representation Analysis (ORA) using MSigDB, Reactome, KEGG, or GO. MSigDB includes curated gene sets (C2) and ontology gene sets (C5) which encompasses Reactome, KEGG, and GO. This function performs enrichment using the gsea or enrich functions from clusterProfiler from BioConductor. The function uses the estimate from a previous statistical analysis for one contrast for all proteins. MSigDB is subset if ontology is KEGG, GO, or Reactome. test_results must contain estimates for all assays. Posthoc results can be used but should be filtered for one contrast to improve interpretability.

Alternative statistical results can be used as input as long as they include the columns "OlinkID", "Assay", and "estimate". A column named "Adjusted_pal" is also needed for ORA. Any statistical results that contains one estimate per protein will work as long as the estimates are comparable to each other.

clusterProfiler is originally developed by Guangchuang Yu at the School of Basic Medical Sciences at Southern Medical University.

T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141. doi: 10.1016/j.xinn.2021.100141

Function Arguments

npx_df <- npx_data1 %>% filter(!grepl("control", SampleID, ignore.case = TRUE))
ttest_results <- olink_ttest(
  df = npx_df,
  variable = "Treatment",
  alternative = "two.sided")

try({ # This expression might fail if dependencies are not installed
gsea_results <- olink_pathway_enrichment(data = npx_data1, test_results = ttest_results)
ora_results <- olink_pathway_enrichment(
  data = npx_data1,
  test_results = ttest_results, method = "ORA")
}, silent = TRUE)

Function Output

A data frame of enrichment results. Columns for ORA include:

Columns for GSEA:

Exploratory analysis

Principal components analysis (PCA) plot (olink_pca_plot)

Generates PCA projection of all samples from NPX data along two principal components (Default PC2 vs PC1) colored by the variable QC_Warning and including the percentage of explained variance. The function used the functions prcomp and ggplot from the R libraries stats and ggplot2, respectively. By default, the values scaled and centered in the PCA and proteins with missing NPX values removed from the corresponding assay(s). Unique sample names are required. Imputation by median value is done for assays with missingness <10% and for multi-plate projects, and for missingness <5% for single plate projects.

The values are by default scaled and centered in the PCA and proteins with missing NPX values are by default removed from the corresponding assay. Unique sample names are required. Imputation by the median is done for assays with missingness <10% for multi-plate projects and <5% for single plate projects. The plot is printed, and a list of ggplot objects is returned.

If byPanel = TRUE, the data processing (imputation of missing values etc) and subsequent PCA is performed separately per panel. A faceted plot is printed, while the individual ggplot objects are returned.

The arguments outlierDefX and outlierDefY can be used to identify outliers in the PCA. Samples more than +/-outlierDef[X,Y] standard deviations from the mean of the plotted PC will be labelled. Both arguments have to be specified.

Function arguments (selection)

npx_data1 %>% 
  filter(!str_detect(SampleID, 'CONTROL_SAMPLE')) %>% 
  olink_pca_plot(df = .,
                 color_g = "QC_Warning", byPanel = TRUE)  

Function output

A list of objects of class ggplot (silently returned). Plots are also printed unless option quiet = TRUE is set. If outlierDefX and outlierDefY are specified, a list of outliers can be extracted from the ggplot object based on these parameters.

npx_data <- npx_data1 %>%
    mutate(SampleID = paste(SampleID, "_", Index, sep = ""))
g <- olink_pca_plot(df=npx_data, color_g = "QC_Warning",
                    outlierDefX = 2.5, outlierDefY = 4, byPanel = TRUE, quiet = TRUE)
lapply(g, function(x){x$data}) %>%
  bind_rows() %>%
  filter(Outlier == 1) %>% 
  select(SampleID, Outlier, Panel)

Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) (olink_umap_plot)

Computes a manifold approximation and projection using umap::umap and plots the two specified components. Unique sample names are required and imputation by the median is done for assays with missingness <10% for multi-plate projects and <5% for single plate projects.

If byPanel = TRUE, the data processing (imputation of missing values etc) and subsequent UMAP is performed separately per panel. A faceted plot is printed, while the individual ggplot objects are returned.

The arguments outlierDefX and outlierDefY can be used to identify outliers in the UMAP results. Samples more than +/-outlierDef[X,Y] standard deviations from the mean of the plotted UMAP component will be labelled. Both arguments have to be specified. NOTE: UMAP is a non-linear data transformation that might not accurately preserve the properties of the data. Distances in the UMAP plane should therefore be interpreted with caution

Function arguments (selection)

npx_data1 %>% 
  filter(!str_detect(SampleID, 'CONTROL_SAMPLE')) %>% 
  olink_umap_plot(df = .,
                 color_g = "QC_Warning", byPanel = TRUE)  

Function output

A list of objects of class ggplot (silently returned). Plots are also printed unless option quiet = TRUE is set.

Visualization

Boxplots for outcomes (olink_boxplot)

The olink_boxplot function is used to generate boxplots of NPX values stratified on a variable for a given list of proteins. olink_boxplot uses the functions ggplot and geom_boxplot of the R library ggplot2.

Function arguments

plot <- npx_data1 %>%
  na.omit() %>% # removing missing values which exists for Site
  olink_boxplot(variable = "Site", 
                olinkid_list = c("OID00488", "OID01276"),
                number_of_proteins_per_plot  = 2)
plot[[1]]

anova_posthoc_results<-npx_data1 %>% 
  olink_anova_posthoc(olinkid_list = c("OID00488", "OID01276"),
                      variable = 'Site',
                      effect = 'Site')

plot2 <- npx_data1 %>%
  na.omit() %>% # removing missing values which exists for Site
  olink_boxplot(variable = "Site", 
                olinkid_list = c("OID00488", "OID01276"),
                number_of_proteins_per_plot  = 2,
                posthoc_results = anova_posthoc_results)

plot2[[1]]

Function output

A list of objects of class ggplot.

Note: Please note that plots will not appear in the plots panel of Rstudio if not assigned to a variable and printing it (see sample code above).

Boxplots for QC (olink_dist_plot)

The olink_dist_plot function generates boxplots of NPX values for each sample, faceted by Olink panel. This is used as an initial QC step to identify potential outliers. olink_dist_plot uses the functions ggplot and geom_boxplot of the R library ggplot2.

Function arguments

npx_data1 %>% 
  filter(Panel == 'Olink Cardiometabolic') %>% # For this example only plotting one panel.
  olink_dist_plot() +
  theme(axis.text.x = element_blank()) # Due to the number of samples one can remove the text or rotate it

Function output

A ggplot object.

Point-range plot for LMER (olink_lmer_plot)

The function olink_lmer_plot generates a point-range plot for a given list of proteins based on linear mixed effect model. The points illustrate the mean NPX level for each group and the error bars illustrate 95% confidence intervals. Facets are labeled by the protein name and corresponding OlinkID for the protein.

Function arguments

plot <- olink_lmer_plot(df = npx_data1, 
                        olinkid_list = c("OID01216", "OID01217"), 
                        variable = c('Site', 'Treatment'), 
                        x_axis_variable =  'Site',
                        col_variable = 'Treatment',
                        random = 'Subject')
plot[[1]]

Function output

A list of objects of class ggplot.

Note: Please note that plots will not appear in the plots panel of Rstudio if not assigned to a variable and printing it (see sample code above).

Heatmap for visualizing pathway enrichment (olink_pathway_heatmap)

The olink_pathway_heatmap function generates a heatmap of proteins related to pathways using the enrichment results from the olink_pathway_enrichment function. Either the top terms can be visualized or terms containing a certain keyword. For each term, the proteins in the test_result data frame that are related to that term will be visualized by their estimate. This visualization can be used to determining how many proteins of interest are involved in a particular pathway and in which direction their estimates are.

Function arguments

# GSEA Heatmap from t-test results
try({ # This expression might fail if dependencies are not installed
olink_pathway_heatmap(enrich_results = gsea_results, test_results = ttest_results)
})
# ORA Heatmap from t-test results with cell keyword
try({ # This expression might fail if dependencies are not installed
olink_pathway_heatmap(enrich_results = ora_results, test_results = ttest_results,
                      method = "ORA", keyword = "cell")
})

Function output

A heatmap as a ggplot object

Bargraph for visualizing pathway enrichment (olink_pathway_visualization)

The olink_pathway_visualization function generates a bar graph of the top terms or terms related to a certain keyword for results from the olink_pathway_enrichment function. The bar represents either the normalized enrichment score (NES) for GSEA results or counts (number of proteins) for ORA results colored by adjusted p-value. The ORA visualization also contains the number of proteins out of the total proteins in that pathway as a ratio after the bar.

Function arguments

Function output

A bar graph as a ggplot object.

Scatterplot for QC (olink_qc_plot)

The olink_qc_plot function generates a facet plot per Panel using ggplot and ggplot2::geom_point and stats::IQR plotting IQR vs. median for all samples. This is a good first check to find out if any samples have a tendency to be classified as outliers. Horizontal dashed lines indicate +/-3 standard deviations from the mean IQR. Vertical dashed lines indicate +/-3 standard deviations from the mean sample median.

Function arguments

npx_data1 %>% 
  filter(!str_detect(SampleID, 'CONTROL_SAMPLE'),
         Panel == 'Olink Inflammation') %>% 
  olink_qc_plot(color_g = "QC_Warning")   

Function output

An object of class ggplot. A list of outliers can be extracted from the ggplot object.

qc <- olink_qc_plot(npx_data1, color_g = "QC_Warning", IQR_outlierDef = 3, median_outlierDef = 3)
qc$data %>% filter(Outlier == 1) %>% select(SampleID, Panel, IQR, sample_median, Outlier)

Heatmap (olink_heatmap_plot)

The olink_heatmap_plot function generates a heatmap for all samples and proteins using pheatmap::pheatmap. By default the heatmap center and scaled NPX across all proteins and cluster samples and proteins using a dendrogram. Unique sample names are required.

Group variable can be annotated and colored in the left side of the heatmap.

Function arguments

first10 <- npx_data1 %>%
  pull(OlinkID) %>% 
  unique() %>% 
  head(10)

first15samples <- npx_data1$SampleID %>% 
  unique() %>% 
  head(15)

npx_data_small <- npx_data1 %>% 
  filter(!str_detect(SampleID, 'CONT')) %>% 
  filter(OlinkID %in% first10) %>% 
  filter(SampleID %in% first15samples)

olink_heatmap_plot(npx_data_small, variable_row_list =  'Treatment')

Function output

An object of class ggplot.

Plot results of t-test (olink_volcano_plot)

The olink_volcano_plot function generates a volcano plot using results from the olink_ttest function using the function ggplot and geom_point of the R library ggplot2. The estimated difference is shown in the x-axis and -log10(p-value) in the y-axis. A horizontal dotted line indicates p-value = 0.05. Dots are colored based on Benjamini-Hochberg adjusted p-value cutoff 0.05 and can optionally be annotated by OlinkID.

Function arguments

# perform t-test
ttest_results <- olink_ttest(df = npx_data1,
                             variable = 'Treatment')
# select names of proteins to show
top_10_name <- ttest_results %>%
  slice_head(n = 10) %>%
  pull(OlinkID)
# volcano plot
olink_volcano_plot(p.val_tbl = ttest_results,
                   x_lab = 'Treatment',
                   olinkid_list = top_10_name)

Function output

An object of class ggplot.

Theming function (set_plot_theme)

This function sets a coherent plot theme for plots by adding it to a ggplot object. It is mainly used for aesthetic reasons.

npx_data1 %>% 
  filter(OlinkID == 'OID01216') %>% 
  ggplot(aes(x = Treatment, y = NPX, fill = Treatment)) +
  geom_boxplot() +
  set_plot_theme()

Color theming (olink_color_discrete, olink_color_gradient, olink_fill_discrete, olink_fill_gradient)

These functions sets a coherent coloring theme for the plots by adding it to a ggplot object. It is mainly used for aesthetic reasons.

npx_data1 %>% 
  filter(OlinkID == 'OID01216') %>% 
  ggplot(aes(x = Treatment, y = NPX, fill = Treatment)) +
  geom_boxplot() +
  set_plot_theme() +
  olink_fill_discrete()


Try the OlinkAnalyze package in your browser

Any scripts or data that you put into this service are public.

OlinkAnalyze documentation built on Nov. 4, 2023, 1:07 a.m.