olink_normalization: Normalization of all proteins (by OlinkID).

View source: R/Olink_normalization.R

olink_normalizationR Documentation

Normalization of all proteins (by OlinkID).

Description

Normalizes NPX data frames to another data frame or to reference medians. If two dataframes are normalized to one another, Olink's default is using the older dataframe as reference. The function handles three different types of normalization:

Bridging normalization: One of the dataframes is adjusted to another using overlapping samples (bridge samples). The overlapping samples need to be named the same between the dataframes and adjustment is made using the median of the paired differences between the bridge samples in the two data frames. The two dataframes are inputs df1 and df2, the one being adjusted to is specified in the input reference_project and the overlapping samples are specified in overlapping_samples_df1. Only overlapping_samples_df1 should be input, no matter which dataframe is used as reference_project.

Subset normalization: One of the dataframes is adjusted to another dataframe using a sample subset. Adjustment is made using the differences in median between the subsets from the two dataframes. Both overlapping_samples_df1 and overlapping_samples_df2 need to be input. The samples do not need to be named the same.
A special case of subset normalization are to use all samples (except control samples and samples with QC warning) from df1 as input in overlapping_samples_df1 and all samples from df2 as input in overlapping_samples_df2.

Reference median normalization: Working only on one dataframe. This is effectively subset normalization, but using difference of medians to pre-recorded median values. df1, overlapping_samples_df1 and reference_medians need to be specified. Adjustment of df1 is made using the differences in median between the overlapping samples and the reference medians.

Usage

olink_normalization(
  df1,
  df2 = NULL,
  overlapping_samples_df1,
  overlapping_samples_df2 = NULL,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1",
  reference_medians = NULL
)

Arguments

df1

First dataframe to be used in normalization (required).

df2

Second dataframe to be used in normalization

overlapping_samples_df1

Samples to be used for adjustment factor calculation in df1 (required).

overlapping_samples_df2

Samples to be used for adjustment factor calculation in df1.

df1_project_nr

Project name of first dataset.

df2_project_nr

Project name of second dataset.

reference_project

Project name of reference_project. Needs to be the same as either df1_project_nr or df2_project_nr. The project to which the second project is adjusted to.

reference_medians

Dataframe which needs to contain columns "OlinkID", and "Reference_NPX". Used for reference median normalization.

Value

A "tibble" of NPX data in long format containing normalized NPX values, including adjustment factors. Columns include same as df1/df2 with additional column Adj_factor which includes the adjustment factor in the normalization.

Examples



library(dplyr)

npx_df1 <- npx_data1 %>% dplyr::mutate(Project = 'P1')
npx_df2 <- npx_data2 %>% dplyr::mutate(Project = 'P2')

#Bridging normalization:
# Find overlapping samples, but exclude Olink control
overlap_samples <- intersect((npx_df1 %>%
                               dplyr::filter(!grepl("control", SampleID,
                                                     ignore.case=TRUE)))$SampleID,
                             (npx_df2 %>%
                               dplyr::filter(!grepl("control", SampleID,
                                                     ignore.case=TRUE)))$SampleID)
# Normalize
olink_normalization(df1 = npx_df1,
                    df2 = npx_df2,
                    overlapping_samples_df1 = overlap_samples,
                    df1_project_nr = 'P1',
                    df2_project_nr = 'P2',
                    reference_project = 'P1')

#Subset normalization:
# Find a suitable subset of samples from both projects, but exclude Olink controls
# and samples which do not pass QC.
df1_sampleIDs <- npx_df1 %>%
    dplyr::filter(QC_Warning == 'Pass') %>%
    dplyr::filter(!stringr::str_detect(SampleID, 'CONTROL_SAMPLE')) %>%
    dplyr::select(SampleID) %>%
    unique() %>%
    dplyr::pull(SampleID)
df2_sampleIDs <- npx_df2 %>%
    dplyr::filter(QC_Warning == 'Pass') %>%
    dplyr::filter(!stringr::str_detect(SampleID, 'CONTROL_SAMPLE')) %>%
    dplyr::select(SampleID) %>%
    unique() %>%
    dplyr::pull(SampleID)
some_samples_df1 <- sample(df1_sampleIDs, 16)
some_samples_df2 <- sample(df2_sampleIDs, 16)

olink_normalization(df1 = npx_df1,
                    df2 = npx_df2,
                    overlapping_samples_df1 = some_samples_df1,
                    overlapping_samples_df2 = some_samples_df2)


## Special case of subset normalization when using all samples.
olink_normalization(df1 = npx_df1,
                    df2 = npx_df2,
                    overlapping_samples_df1 = df1_sampleIDs,
                    overlapping_samples_df2 = df2_sampleIDs)


#Reference median normalization:
# For the sake of this example, set the reference median to 1
ref_median_df <- npx_df1 %>%
    dplyr::select(OlinkID) %>%
    dplyr::distinct() %>%
    dplyr::mutate(Reference_NPX = 1)
# Normalize
olink_normalization(df1 = npx_df1,
                    overlapping_samples_df1 = some_samples_df1,
                    reference_medians = ref_median_df)


OlinkAnalyze documentation built on Nov. 4, 2023, 1:07 a.m.