calcMediansByClusterMarker: Calculate medians (by cluster and marker)

View source: R/calcMediansByClusterMarker.R

calcMediansByClusterMarkerR Documentation

Calculate medians (by cluster and marker)

Description

Calculate medians for each cluster-marker combination

Usage

calcMediansByClusterMarker(d_se)

Arguments

d_se

Data object from previous steps, in SummarizedExperiment format, containing cluster labels as a column in the row meta-data (from generateClusters). Column meta-data is assumed to contain a factor marker_class.

Details

Calculate median marker expression for each cluster, across all samples (i.e. medians for each cluster-marker combination).

The data object is assumed to contain a factor marker_class in the column meta-data (see prepareData), which indicates the protein marker class for each column of data ("type", "state", or "none"). Cluster medians are calculated for all markers.

The medians by cluster and marker are required for plotting purposes.

Variables id_type_markers and id_state_markers are saved in the metadata slot of the output object. These can be used to identify the 'cell type' and 'cell state' markers in the sequence of markers (columns) in the output object, which is useful in later steps of the 'diffcyt' pipeline.

Results are returned as a new SummarizedExperiment object, where rows = clusters, columns = markers, assay = values (marker expression values). The metadata slot also contains variables id_type_markers and id_state_markers, which can be used to identify the sets of cell type and cell state markers in the columns.

Value

d_medians_by_cluster_marker: SummarizedExperiment object, where rows = clusters, columns = markers, assay = values (marker expression values). The metadata slot contains variables id_type_markers and id_state_markers, which can be accessed with metadata(d_medians)$id_type_markers and metadata(d_medians)$id_state_markers.

Examples

# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline, 
# see the package vignette.

# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
  d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
  colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
  d
}

# Create random data (without differential signal)
set.seed(123)
d_input <- list(
  sample1 = d_random(), 
  sample2 = d_random(), 
  sample3 = d_random(), 
  sample4 = d_random()
)

experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:4)), 
  group_id = factor(c("group1", "group1", "group2", "group2")), 
  stringsAsFactors = FALSE
)

marker_info <- data.frame(
  channel_name = paste0("channel", sprintf("%03d", 1:20)), 
  marker_name = paste0("marker", sprintf("%02d", 1:20)), 
  marker_class = factor(c(rep("type", 10), rep("state", 10)), 
                        levels = c("type", "state", "none")), 
  stringsAsFactors = FALSE
)

# Prepare data
d_se <- prepareData(d_input, experiment_info, marker_info)

# Transform data
d_se <- transformData(d_se)

# Generate clusters
d_se <- generateClusters(d_se)

# Calculate medians (by cluster and marker)
d_medians_by_cluster_marker <- calcMediansByClusterMarker(d_se)


lmweber/diffcyt documentation built on March 19, 2024, 5:24 a.m.