| clean_npx | R Documentation |
This function applies a series of cleaning steps to a data set exported by
Olink Software and imported in R by read_npx(). Some of the steps of this
function rely on results from check_npx().
This function removes samples and assays that are not suitable for downstream statistical analysis. Some of the data records that are removed include duplicate sample identifiers, external controls samples, internal control assays, and samples or assays with quality control flags.
clean_npx(
df,
check_log = NULL,
remove_assay_na = TRUE,
remove_invalid_oid = TRUE,
remove_dup_sample_id = TRUE,
remove_control_assay = TRUE,
remove_control_sample = TRUE,
remove_qc_warning = TRUE,
remove_assay_warning = TRUE,
control_sample_ids = NULL,
convert_df_cols = TRUE,
convert_nonunique_uniprot = TRUE,
out_df = "tibble",
verbose = FALSE
)
df |
A "tibble" or "ArrowObject"
from |
check_log |
A named list returned by |
remove_assay_na |
Logical. If |
remove_invalid_oid |
Logical. If |
remove_dup_sample_id |
Logical. If |
remove_control_assay |
If |
remove_control_sample |
If |
remove_qc_warning |
Logical. If |
remove_assay_warning |
Logical. If |
control_sample_ids |
character vector of sample identifiers of control
samples. Default |
convert_df_cols |
Logical. If |
convert_nonunique_uniprot |
Logical. If |
out_df |
The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble". |
verbose |
Logical. If |
The pipeline performs the following steps:
Remove assays with invalid identifiers: assays flagged as having
invalid identifiers from check_npx(). Occurs when the original data set
provided by Olink Software has been modified.
Remove assays with NA quantification values: assays lacking
quantification data are reported with NA as quantification. These assays
are identified in check_npx().
Remove samples with duplicate identifiers: samples with identical
identifiers detected by check_npx(). Instances of duplicate sample
identifiers cause errors in the downstream analysis of data with, and it is
highly discouraged.
Remove external control samples:
Uses column marking sample type (e.g. SampleType) to exclude external
control samples.
Uses column marking sample identifier (e.g. SampleID) to remove
external control samples, or samples that ones wants to exclude from the
downstream analysis.
Remove samples failing quality control: samples with QC status FAIL.
Remove internal control assays: Uses column marking assay type (e.g.
AssayType) to exclude internal control assays.
Remove assays with quality controls warnings: assays with QC status
WARN.
Correct column data type: ensure that certain columns have the
expected data type (class). These columns are identified in check_npx().
Resolve multiple UniProt mappings per assay: ensure that each assay
identifier (e.g., OlinkID) maps uniquely to a single UniProt ID.
Important:
When data set lacks a column marking sample type (e.g. SampleType), one
should remove external control samples based on their sample identifiers.
This function does not auto-detect external control samples based on their
sample identifiers. Please ensure external control samples have been
removed prior to downstream statistical analysis.
When data set lacks a column marking assay type (e.g. AssayType), one
should remove internal control assays manually. This function does not
auto-detect internal control assays. Please ensure internal control assays
have been removed prior to downstream statistical analysis.
Dataset, "tibble" or "ArrowObject", with Olink data in long format.
Kang Dong Klev Diamanti
## Not run:
# run check_npx
check_log <- check_npx(
df = npx_data1
)
# run clean_npx
clean_npx(
df = npx_data1,
check_log = check_log
)
# run clean_npx with messages for all steps
clean_npx(
df = npx_data1,
check_log = check_log,
verbose = TRUE
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.