train_REF_model: Normalize using reference sample
In prybakowska/CytoQP: Cytometry data quality control and cleaninig

View source: R/file_normalization_with_reference_sample.R

train_REF_model

R Documentation

Normalize using reference sample

Description

Perform normalization using reference files. Takes advantage of CytoNorm package.

Usage

train_REF_model(
  df,
  markers_to_normalize = NULL,
  transformList = NULL,
  arcsine_transform = TRUE,
  nQ = 101,
  limit = NULL,
  quantileValues = NULL,
  goal = "mean",
  to_plot = TRUE,
  norm_with_clustering = FALSE,
  seed = NULL,
  nCells = 10000,
  xdim = 10,
  ydim = 10,
  nClus = 10,
  clustering_markers = NULL,
  out_dir = NULL,
  save_model = FALSE
)

Arguments

`df`	Data frame containing following columns: file_paths (the full path to the files to be normalized), batch_label (batch label for each file), ref_ids (logical defining TRUE values for reference sample).
`markers_to_normalize`	Character vector, marker names to be normalized, can be full marker name e.g. "CD45$" (only CD45 marker will be picked) or "CD" (all markers containig "CD" will be used). If NULL (default) all non-mass markers will be normalized.
`transformList`	Transformation list to pass to the flowCore transform function. Defult is set to NULL. Either transformList or arcsine_transform needs to be defined.
`arcsine_transform`	Logical, if the data should be transformed with arcsine transformation and cofactor 5, default is set to TRUE. Either transformList or arcsine_transform needs to be defined.
`nQ`	Numeric, as in CytoNorm, number of quantiles to use. Default = 101, which results in quantiles for every percent of the data.
`limit`	Numeric, as in CytoNorm, these values are modeled to map onto themselves by the spline.
`quantileValues`	Numeric, as in CytoNorm, If specified, it should be a vector of length nQ with values between 0 and 1, giving the percentages at which the quantiles should be computed. If NULL (default), the quantiles will be evenly distributed, including 0 and 1.
`goal`	Goal distribution. Default "mean", can also be nQ numeric values or one of the batch labels.
`to_plot`	Logical, if TRUE, a plot is generated (using the layout function) showing all quantiles. Ic norm_with_cluster = TRUE, FlowSOM clustering quality plots will be also generated. Default = FALSE.
`norm_with_clustering`	Logical, if data should be normalized using clustering algorithm, FlowSOM.Default set to FALSE.
`seed`	Numeric, set to obtain reproducible results, when norm_with_clustering set to TRUE. Default NULL.
`nCells`	Numeric, the number of cells, to use for FlowSOM clustering. This number is determined by total number of fcs files, as by default 1000 cells is used per file. Only if norm_with_clustering set to TRUE.
`xdim`	Numeric, parameter to pass to FlowSOM, width of the SOM grid. Only if norm_with_clustering set to TRUE.
`ydim`	Numeric, parameter to pass to FlowSOM, geight of the SOM grid. Only if norm_with_clustering set to TRUE.
`nClus`	Numeric, exact number of clusters for metaclustering. Only if norm_with_clustering set to TRUE.
`clustering_markers`	Character vector, marker names to be used for clustering, can be full marker name e.g. "CD45$" (only CD45 marker will be picked) or "CD" (all markers containig "CD" will be used). Default (NULL), all the markers defined in markers_to_normalize will be used. Only if norm_with_clustering set to TRUE.
`out_dir`	Character, pathway to where the FlowSOM clustering plot should be saved, default is set to working directory. If NULL, files will be saved in file.path(getwd(), CytoNorm).
`save_model`	Logical, if the model should be saved, if TRUE it will be saved to our_dir. Default set to FALSE.

Value

model, describing the normalization function

Examples

# Set input directory
gate_dir <- file.path(dir, "Gated")

# Define reference samples
files_ref <- list.files(gate_dir,
                        pattern = "*_gated.fcs$",
                        full.names = TRUE,
                        recursive = T)

df <- data.frame("file_paths" = files_ref,
                "batch_labels" = stringr::str_match(files_ref, "day[0-9]*")[,1],
                "ref_ids" = grepl("REF", files_ref))


model <- train_REF_model(df = df,
                        markers_to_normalize = c("CD", "HLA", "IgD",
                                                 "IL", "TN", "MCP", "MIP",
                                                 "Gran", "IFNa"),
                        arcsine_transform = TRUE,
                        nQ = 2,
                        limit = c(0,8),
                        quantileValues = c(0.05, 0.95),
                        goal = "mean",
                        norm_with_clustering = FALSE,
                        save_model = TRUE,
                        clustering_markers = c("CD", "HLA", "IgD"))

prybakowska/CytoQP documentation built on June 28, 2022, 12:36 a.m.