train_REF_model: Normalize using reference sample

View source: R/file_normalization_with_reference_sample.R

train_REF_modelR Documentation

Normalize using reference sample

Description

Perform normalization using reference files. Takes advantage of CytoNorm package.

Usage

train_REF_model(
  df,
  markers_to_normalize = NULL,
  transformList = NULL,
  arcsine_transform = TRUE,
  nQ = 101,
  limit = NULL,
  quantileValues = NULL,
  goal = "mean",
  to_plot = TRUE,
  norm_with_clustering = FALSE,
  seed = NULL,
  nCells = 10000,
  xdim = 10,
  ydim = 10,
  nClus = 10,
  clustering_markers = NULL,
  out_dir = NULL,
  save_model = FALSE
)

Arguments

df

Data frame containing following columns: file_paths (the full path to the files to be normalized), batch_label (batch label for each file), ref_ids (logical defining TRUE values for reference sample).

markers_to_normalize

Character vector, marker names to be normalized, can be full marker name e.g. "CD45$" (only CD45 marker will be picked) or "CD" (all markers containig "CD" will be used). If NULL (default) all non-mass markers will be normalized.

transformList

Transformation list to pass to the flowCore transform function. Defult is set to NULL. Either transformList or arcsine_transform needs to be defined.

arcsine_transform

Logical, if the data should be transformed with arcsine transformation and cofactor 5, default is set to TRUE. Either transformList or arcsine_transform needs to be defined.

nQ

Numeric, as in CytoNorm, number of quantiles to use. Default = 101, which results in quantiles for every percent of the data.

limit

Numeric, as in CytoNorm, these values are modeled to map onto themselves by the spline.

quantileValues

Numeric, as in CytoNorm, If specified, it should be a vector of length nQ with values between 0 and 1, giving the percentages at which the quantiles should be computed. If NULL (default), the quantiles will be evenly distributed, including 0 and 1.

goal

Goal distribution. Default "mean", can also be nQ numeric values or one of the batch labels.

to_plot

Logical, if TRUE, a plot is generated (using the layout function) showing all quantiles. Ic norm_with_cluster = TRUE, FlowSOM clustering quality plots will be also generated. Default = FALSE.

norm_with_clustering

Logical, if data should be normalized using clustering algorithm, FlowSOM.Default set to FALSE.

seed

Numeric, set to obtain reproducible results, when norm_with_clustering set to TRUE. Default NULL.

nCells

Numeric, the number of cells, to use for FlowSOM clustering. This number is determined by total number of fcs files, as by default 1000 cells is used per file. Only if norm_with_clustering set to TRUE.

xdim

Numeric, parameter to pass to FlowSOM, width of the SOM grid. Only if norm_with_clustering set to TRUE.

ydim

Numeric, parameter to pass to FlowSOM, geight of the SOM grid. Only if norm_with_clustering set to TRUE.

nClus

Numeric, exact number of clusters for metaclustering. Only if norm_with_clustering set to TRUE.

clustering_markers

Character vector, marker names to be used for clustering, can be full marker name e.g. "CD45$" (only CD45 marker will be picked) or "CD" (all markers containig "CD" will be used). Default (NULL), all the markers defined in markers_to_normalize will be used. Only if norm_with_clustering set to TRUE.

out_dir

Character, pathway to where the FlowSOM clustering plot should be saved, default is set to working directory. If NULL, files will be saved in file.path(getwd(), CytoNorm).

save_model

Logical, if the model should be saved, if TRUE it will be saved to our_dir. Default set to FALSE.

Value

model, describing the normalization function

Examples

# Set input directory
gate_dir <- file.path(dir, "Gated")

# Define reference samples
files_ref <- list.files(gate_dir,
                        pattern = "*_gated.fcs$",
                        full.names = TRUE,
                        recursive = T)

df <- data.frame("file_paths" = files_ref,
                "batch_labels" = stringr::str_match(files_ref, "day[0-9]*")[,1],
                "ref_ids" = grepl("REF", files_ref))


model <- train_REF_model(df = df,
                        markers_to_normalize = c("CD", "HLA", "IgD",
                                                 "IL", "TN", "MCP", "MIP",
                                                 "Gran", "IFNa"),
                        arcsine_transform = TRUE,
                        nQ = 2,
                        limit = c(0,8),
                        quantileValues = c(0.05, 0.95),
                        goal = "mean",
                        norm_with_clustering = FALSE,
                        save_model = TRUE,
                        clustering_markers = c("CD", "HLA", "IgD"))


prybakowska/CytoQP documentation built on June 28, 2022, 12:36 a.m.