View source: R/olink_normalization.R
olink_normalization | R Documentation |
Normalizes two Olink datasets to each other, or one Olink dataset to a reference set of medians values.
olink_normalization(
df1,
df2 = NULL,
overlapping_samples_df1,
overlapping_samples_df2 = NULL,
df1_project_nr = "P1",
df2_project_nr = "P2",
reference_project = "P1",
reference_medians = NULL,
format = FALSE
)
df1 |
First dataset to be used for normalization (required). |
df2 |
Second dataset to be used for normalization. Required for bridge and subset normalization. |
overlapping_samples_df1 |
Character vector of samples to be used for the
calculation of adjustment factors in |
overlapping_samples_df2 |
Character vector of samples to be used for the
calculation of adjustment factors in |
df1_project_nr |
Project name of first dataset (required). |
df2_project_nr |
Project name of second dataset. Required for bridge and subset normalization. |
reference_project |
Project to be used as reference project. Should
be one of |
reference_medians |
Dataset with columns "OlinkID" and "Reference_NPX". Required for reference median normalization. |
format |
Boolean that controls whether the normalized dataset will be formatted for input to downstream analysis. Only applicable for cross-product bridge normalization. |
The function handles four different types of normalization:
Bridge normalization: One of the datasets is adjusted to another
using overlapping samples (bridge samples). Overlapping samples need to have
the same identifiers in both datasets. Normalization is performed using the
median of the pair-wise differences between the bridge samples in the two
datasets. The two datasets are provided as df1
and df2
, and the one
being adjusted to is specified in the input reference_project
; overlapping
samples are specified in overlapping_samples_df1
. Only
overlapping_samples_df1
should be provided regardless of the dataset used
as reference_project
.
Subset normalization: One of the datasets is adjusted to another
using a subset of samples from each. Normalization is performed using the
differences of the medians between the subsets from the two datasets. Both
overlapping_samples_df1
and overlapping_samples_df2
need to be provided,
and sample identifiers do not need to be the same.
A special case of subset normalization occurs when all samples (except
control samples and samples with QC warnings) from each dataset are used
for normalization; this special case is called intensity normalization. In
intensity normalization all unique sample identifiers from df1
are
provided as input in overlapping_samples_df1
and all unique sample
identifiers from df2
are provided as input in overlapping_samples_df2
.
Reference median normalization: One of the datasets (df1
) is
adjusted to a predefined set of adjustment factors. This is effectively
subset normalization, but using differences of medians to pre-recorded
median values. df1
, overlapping_samples_df1
, df1_project_nr
and
reference_medians
need to be specified. Dataset df1
is normalized using
the differences in median between the overlapping samples and the reference
medians.
Cross-product normalization: One of the datasets is adjusted to
another using the median of pair-wise differences of overlapping samples
(bridge samples) or quantile smoothing using overlapping
samples as reference to adjust the distributions. Overlapping samples need
to have the same identifiers in both datasets. The two datasets are provided
as df1
and df2
, and the one being adjusted to is specified in the input
reference_project
; Note that in cross-product normalization the
reference project is predefined, and in case the argument
reference_project
does not match the expected reference project an error
will be returned. Overlapping samples are specified in
overlapping_samples_df1
. Only overlapping_samples_df1
should be provided
regardless of the dataset used as reference_project
. This functionality
does not modify the column with original quantification values
(e.g. NPX), instead it normalizes it with 2 different approaches in columns
"MedianCenteredNPX" and "QSNormalizedNPX", and provides a recommendation in
"BridgingRecommendation" about which of the two columns is to be used.
The output dataset is df1
if reference median normalization, or df2
appended to df1
if bridge, subset or cross-product normalization. The
output dataset contains all original columns from the original dataset(s),
and the columns:
"Project" and "Adj_factor" in case of reference median, bridge and subset
normalization. The former marks the project of origin based on
df1_project_nr
and df2_project_nr
, and the latter the adjustment factor
that was applied to the non-reference dataset.
"Project", "OlinkID_E3072", "MedianCenteredNPX", "QSNormalizedNPX",
"BridgingRecommendation" in case of cross-product normalization. The columns
correspond to the project of origin based on df1_project_nr
and
df2_project_nr
, the assay identifier in the non-reference project, the
bridge-normalized quantification value, the quantile smoothing-normalized
quantification value, and the recommendation about which of the two
normalized values is more suitable for downstream analysis.
Tibble or ArrowObject with the normalized dataset.
# prepare datasets
npx_df1 <- npx_data1 |>
dplyr::mutate(
Normalization = "Intensity"
)
npx_df2 <- npx_data2 |>
dplyr::mutate(
Normalization = "Intensity"
)
# bridge normalization
# overlapping samples - exclude control samples
overlap_samples <- intersect(x = npx_df1$SampleID,
y = npx_df2$SampleID) |>
(\(x) x[!grepl("^CONTROL_SAMPLE", x)])()
# normalize
olink_normalization(
df1 = npx_df1,
df2 = npx_df2,
overlapping_samples_df1 = overlap_samples,
df1_project_nr = "P1",
df2_project_nr = "P2",
reference_project = "P1"
)
# subset normalization
# find a suitable subset of samples from each dataset:
# exclude control samples
# exclude samples that do not pass QC
df1_samples <- npx_df1 |>
dplyr::group_by(
dplyr::pick(
dplyr::all_of("SampleID")
)
)|>
dplyr::filter(
all(.data[["QC_Warning"]] == 'Pass')
) |>
dplyr::ungroup() |>
dplyr::filter(
!grepl(pattern = "^CONTROL_SAMPLE", x = .data[["SampleID"]])
) |>
dplyr::pull(
.data[["SampleID"]]
) |>
unique()
df2_samples <- npx_df2 |>
dplyr::group_by(
dplyr::pick(
dplyr::all_of("SampleID")
)
)|>
dplyr::filter(
all(.data[["QC_Warning"]] == 'Pass')
) |>
dplyr::ungroup() |>
dplyr::filter(
!grepl(pattern = "^CONTROL_SAMPLE", x = .data[["SampleID"]])
) |>
dplyr::pull(
.data[["SampleID"]]
) |>
unique()
# select a subset of samples from each set from above
df1_subset <- sample(x = df1_samples, size = 16L)
df2_subset <- sample(x = df2_samples, size = 20L)
# normalize
olink_normalization(
df1 = npx_df1,
df2 = npx_df2,
overlapping_samples_df1 = df1_subset,
overlapping_samples_df2 = df2_subset,
df1_project_nr = "P1",
df2_project_nr = "P2",
reference_project = "P1"
)
# special case of subset normalization using all samples
olink_normalization(
df1 = npx_df1,
df2 = npx_df2,
overlapping_samples_df1 = df1_samples,
overlapping_samples_df2 = df2_samples,
df1_project_nr = "P1",
df2_project_nr = "P2",
reference_project = "P1"
)
# reference median normalization
# For the sake of this example, set the reference median to 1
ref_med_df <- npx_data1 |>
dplyr::select(
dplyr::all_of(
c("OlinkID")
)
) |>
dplyr::distinct() |>
dplyr::mutate(
Reference_NPX = runif(n = dplyr::n(),
min = -1,
max = 1)
)
# normalize
olink_normalization(
df1 = npx_df1,
overlapping_samples_df1 = df1_subset,
reference_medians = ref_med_df
)
# cross-product normalization
# get reference samples
overlap_samples_product <- intersect(
x = unique(OlinkAnalyze:::data_ht_small$SampleID),
y = unique(OlinkAnalyze:::data_3k_small$SampleID)
) |>
(\(.) .[!grepl("CONTROL", .)])()
# normalize
olink_normalization(
df1 = OlinkAnalyze:::data_ht_small,
df2 = OlinkAnalyze:::data_3k_small,
overlapping_samples_df1 = overlap_samples_product,
df1_project_nr = "proj_ht",
df2_project_nr = "proj_3k",
reference_project = "proj_ht",
format = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.