prepareData: Prepare data
In lmweber/diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering

prepareData

R Documentation

Prepare data

Description

Prepare data into format for diffcyt pipeline

Usage

prepareData(
  d_input,
  experiment_info,
  marker_info,
  cols_to_include = NULL,
  subsampling = FALSE,
  n_sub = NULL,
  seed_sub = NULL
)

Arguments

`d_input`	Input data. Must be a `flowSet` or list of `flowFrames`, `DataFrames`, `data.frames`, or matrices as input (one `flowFrame` or list item per sample).
`experiment_info`	`data.frame`, `DataFrame`, or `tbl_df` of experiment information, for example sample IDs and group IDs. Must contain a column named `sample_id`.
`marker_info`	`data.frame`, `DataFrame`, or `tbl_df` of marker information for each column of data. This should contain columns named `marker_name` and `marker_class`. The columns contain: (i) marker names (and any other column names); and (ii) a factor indicating the marker class for each column (with entries `"type"`, `"state"`, or `"none"`).
`cols_to_include`	Logical vector indicating which columns to include from the input data. Default = all columns.
`subsampling`	Whether to use random subsampling to select an equal number of cells from each sample. Default = FALSE.
`n_sub`	Number of cells to select from each sample by random subsampling, if `subsampling = TRUE`. Default = number of cells in smallest sample.
`seed_sub`	Random seed for subsampling. Set to an integer value to generate reproducible results. Default = `NULL`.

Details

Functions in the diffcyt analysis pipeline assume that input data is provided as a SummarizedExperiment object, which contains a single matrix of expression values, together with row and column meta-data.

This function accepts a flowSet or a list of flowFrames, data.frames, or matrices as input (i.e. one flowFrame or list item per sample). The function then concatenates the data tables into a single matrix of values, and adds row and column meta-data.

Row meta-data should be provided as a data frame named experiment_info, containing columns of relevant experiment information, such as sample IDs and group IDs (for each sample). This must contain at least a column named sample_id.

Column meta-data should be provided as a data frame named marker_info, containing the following columns of marker information. The column names must be as shown.

marker_name: protein marker names (and column names for any other columns)
marker_class: factor indicating the protein marker class for each column of data (usually, entries will be either "type", "state", or "none")

The split into 'cell type' and 'cell state' markers is crucial for the analysis. Cell type markers are used to define cell populations by clustering, and to test for differential abundance of cell populations; while cell state markers are used to test for differential states within cell populations.

The optional argument cols_to_include allows unnecessary columns (e.g. any columns not containing protein markers) to be discarded.

Optionally, random subsampling can be used to select an equal number of cells from each sample (subsampling = TRUE). This can be useful when there are large differences in total numbers of cells per sample, since it ensures that samples with relatively large numbers of cells do not dominate the clustering. However, subsampling should generally not be used when rare cell populations are of interest, due to the significant loss of information if cells from the rare population are discarded.

Value

d_se: Returns data as a SummarizedExperiment containing a single matrix of data (expression values) in the assays slot, together with row meta-data (experiment information) and column meta-data (marker information). The metadata slot also contains the experiment_info data frame, and a vector n_cells of the number of cells per sample; these can be accessed with metadata(d_se)$experiment_info and metadata(d_se)$n_cells.

Examples

# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline, 
# see the package vignette.

# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
  d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
  colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
  d
}

# Create random data (without differential signal)
set.seed(123)
d_input <- list(
  sample1 = d_random(), 
  sample2 = d_random(), 
  sample3 = d_random(), 
  sample4 = d_random()
)

experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:4)), 
  group_id = factor(c("group1", "group1", "group2", "group2")), 
  stringsAsFactors = FALSE
)

marker_info <- data.frame(
  channel_name = paste0("channel", sprintf("%03d", 1:20)), 
  marker_name = paste0("marker", sprintf("%02d", 1:20)), 
  marker_class = factor(c(rep("type", 10), rep("state", 10)), 
                        levels = c("type", "state", "none")), 
  stringsAsFactors = FALSE
)

# Prepare data
d_se <- prepareData(d_input, experiment_info, marker_info)

lmweber/diffcyt documentation built on Feb. 10, 2025, 5:15 p.m.

lmweber/diffcyt index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lmweber/diffcyt
Differential discovery in high-dimensional cytometry via high-resolution clustering

prepareData: Prepare data
In lmweber/diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering

Prepare data

Description

Usage

Arguments

Details

Value

Examples

Related to prepareData in lmweber/diffcyt...

R Package Documentation

Browse R Packages

We want your feedback!

lmweber/diffcyt Differential discovery in high-dimensional cytometry via high-resolution clustering

prepareData: Prepare data In lmweber/diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering

Prepare data

Description

Usage

Arguments

Details

Value

Examples

Related to prepareData in lmweber/diffcyt...

R Package Documentation

Browse R Packages

We want your feedback!

lmweber/diffcyt
Differential discovery in high-dimensional cytometry via high-resolution clustering

prepareData: Prepare data
In lmweber/diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering