Description Usage Arguments Examples
Random Forest classifier for supervised demarcation of groups using flow cytometry data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | RandomF_FCS(
x,
sample_info,
sample_col = "name",
target_label,
downsample = 0,
classification_type = "sample",
param = c("FL1-H", "FL3-H", "FSC-H", "SSC-H"),
p_train = 0.75,
seed = 777,
cleanFCS = FALSE,
timesplit = 0.1,
TimeChannel = "Time",
plot_fig = FALSE,
method = "rf"
)
|
x |
flowSet object where the necessary metadata for classification is included in the phenoData. |
sample_info |
Sample information necessary for the classification, has to contain a column named "name" which matches the samplenames of the FCS files stored in the flowSet. |
sample_col |
Column name of the sample names in sample_info. Defaults to "name". |
target_label |
column name of the sample_info dataframe that should be predicted based on the flow cytometry data. |
downsample |
Indicate to which sample size should be downsampled. By default samples are downsampled to the sample size of the sample with the lowest number of cells. Defaults to sample level. |
classification_type |
whether to perform sample-level or single-cell level classification (defaults to sample-level) |
param |
Parameters to base classification on. |
p_train |
Percentage of the data set that should be used for training the model. |
seed |
Set random seed to be used during the analysis. Put at 777 by default. |
cleanFCS |
Indicate whether outlier removal should be conducted prior to model estimation. Defaults to FALSE. I would recommend to make sure samples have > 500 cells. Will denoise based on the parameters specified in 'param'. |
timesplit |
Fraction of timestep used in flowAI for denoising. Please consult the 'flowAI::flow_auto_qc' function for more information. |
TimeChannel |
Name of time channel in the FCS files. This can differ between flow cytometers. Defaults to "Time". You can check this by: colnames(flowSet). |
plot_fig |
Should the confusion matrix and the overall performance statistics on the test data partition be visualized? Defaults to FALSE. |
method |
method used by caret::train for learning (defaults to Random forests) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | # 1. Example with environmental data:
# Load raw data (imported using flowCore)
data(flowData)
# Format necessary metadata
metadata <- data.frame(names = flowCore::sampleNames(flowData),
do.call(rbind, lapply(strsplit(flowCore::sampleNames(flowData),"_"), rbind)))
colnames(metadata) <- c("Sample_names", "Cycle_nr", "Location", "day",
"timepoint", "Staining", "Reactor_phase", "replicate")
# Run Random Forest classifier to predict the Reactor phase based on the
# single-cell FCM data
model_rf <- RandomF_FCS(flowData, sample_info = metadata[1:10, ], sample_col = "Sample_names",
target_label = "Reactor_phase",
downsample = 10)
# Make a model prediction on new data and report contigency table of predictions
model_pred <- RandomF_predict(x = model_rf[[1]], new_data = flowData[1], cleanFCS = FALSE)
print(model_pred)
# 2. Example with synthetic community data
# Load flow cytometry data of two strains with each 5,000 cells measured
data(flowData_ax)
# Quickly generate the necesary metadata
metadata_syn <- data.frame(name = flowCore::sampleNames(flowData_ax),
labels = flowCore::sampleNames(flowData_ax))
# Run Random forest model on 100 cells of each strain
model_rf_syn <-
RandomF_FCS(
flowData_ax,
sample_info = metadata_syn,
sample_col = "name",
target_label = "labels",
downsample = 100,
plot_fig = TRUE
)
# Make predictions on each of the samples or on new data of the mixed communities
model_pred_syn <- RandomF_predict(x = model_rf_syn[[1]], new_data = flowData_ax, cleanFCS = FALSE)
print(model_pred_syn)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.