RandomF_FCS: Random Forest classifier for supervised demarcation of groups...

Description Usage Arguments Examples

View source: R/RandomF_FCS.R

Description

Random Forest classifier for supervised demarcation of groups using flow cytometry data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
RandomF_FCS(
  x,
  sample_info,
  sample_col = "name",
  target_label,
  downsample = 0,
  classification_type = "sample",
  param = c("FL1-H", "FL3-H", "FSC-H", "SSC-H"),
  p_train = 0.75,
  seed = 777,
  cleanFCS = FALSE,
  timesplit = 0.1,
  TimeChannel = "Time",
  plot_fig = FALSE,
  method = "rf"
)

Arguments

x

flowSet object where the necessary metadata for classification is included in the phenoData.

sample_info

Sample information necessary for the classification, has to contain a column named "name" which matches the samplenames of the FCS files stored in the flowSet.

sample_col

Column name of the sample names in sample_info. Defaults to "name".

target_label

column name of the sample_info dataframe that should be predicted based on the flow cytometry data.

downsample

Indicate to which sample size should be downsampled. By default samples are downsampled to the sample size of the sample with the lowest number of cells. Defaults to sample level.

classification_type

whether to perform sample-level or single-cell level classification (defaults to sample-level)

param

Parameters to base classification on.

p_train

Percentage of the data set that should be used for training the model.

seed

Set random seed to be used during the analysis. Put at 777 by default.

cleanFCS

Indicate whether outlier removal should be conducted prior to model estimation. Defaults to FALSE. I would recommend to make sure samples have > 500 cells. Will denoise based on the parameters specified in 'param'.

timesplit

Fraction of timestep used in flowAI for denoising. Please consult the 'flowAI::flow_auto_qc' function for more information.

TimeChannel

Name of time channel in the FCS files. This can differ between flow cytometers. Defaults to "Time". You can check this by: colnames(flowSet).

plot_fig

Should the confusion matrix and the overall performance statistics on the test data partition be visualized? Defaults to FALSE.

method

method used by caret::train for learning (defaults to Random forests)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# 1. Example with environmental data:

# Load raw data (imported using flowCore)
data(flowData)

# Format necessary metadata
metadata <- data.frame(names = flowCore::sampleNames(flowData), 
do.call(rbind, lapply(strsplit(flowCore::sampleNames(flowData),"_"), rbind)))
colnames(metadata) <- c("Sample_names", "Cycle_nr", "Location", "day", 
"timepoint", "Staining", "Reactor_phase", "replicate")

# Run Random Forest classifier to predict the Reactor phase based on the
# single-cell FCM data
model_rf <- RandomF_FCS(flowData, sample_info = metadata[1:10, ], sample_col = "Sample_names", 
target_label = "Reactor_phase",
downsample = 10)

# Make a model prediction on new data and report contigency table of predictions
model_pred <- RandomF_predict(x = model_rf[[1]], new_data =  flowData[1], cleanFCS = FALSE)
print(model_pred)

# 2. Example with synthetic community data
# Load flow cytometry data of two strains with each 5,000 cells measured
data(flowData_ax)

# Quickly generate the necesary metadata
metadata_syn <- data.frame(name = flowCore::sampleNames(flowData_ax),
                       labels = flowCore::sampleNames(flowData_ax))

# Run Random forest model on 100 cells of each strain
model_rf_syn <-
  RandomF_FCS(
    flowData_ax,
    sample_info = metadata_syn,
    sample_col = "name",
    target_label = "labels",
    downsample = 100,
    plot_fig = TRUE
  )
                        
# Make predictions on each of the samples or on new data of the mixed communities
model_pred_syn <- RandomF_predict(x = model_rf_syn[[1]], new_data =  flowData_ax, cleanFCS = FALSE)
print(model_pred_syn)

rprops/Phenoflow_package documentation built on Sept. 22, 2020, 5:43 p.m.