View source: R/impute_randomforest.R
| impute_randomforest | R Documentation |
impute_randomforest performs imputation for missing values in the data using the random
forest-based method implemented in the missForest package.
impute_randomforest(
data,
sample,
grouping,
intensity_log2,
retain_columns = NULL,
...
)
data |
A data frame that contains the input variables. This should include columns for the sample names, precursor or peptide identifiers, and intensity values. |
sample |
A character column in the |
grouping |
A character column in the |
intensity_log2 |
A numeric column in the |
retain_columns |
A character vector indicating which columns should be retained from the
input data frame. These columns will be preserved in the output alongside the imputed values.
By default, no additional columns are retained ( |
... |
Additional parameters to pass to the |
The function imputes missing values by building random forests, where missing values are predicted based on other available values within the dataset. For each variable with missing data, the function trains a random forest model using the available (non-missing) data in that variable, and subsequently predicts the missing values.
In addition to the imputed values, users can choose to retain additional columns from the original input data frame that were not part of the imputation process.
This function allows passing additional parameters to the underlying missForest function,
such as controlling the number of trees used in the random forest models or specifying the
stopping criteria. For a full list of parameters, refer to the missForest documentation.
To enable parallelisation, ensure that the doParallel package is installed and loaded:
install.packages("doParallel")
library(doParallel)
Then register the desired number of cores for parallel processing:
registerDoParallel(cores = 6)
To leverage parallelisation during the imputation, pass parallelize = "variables"
as an argument to the missForest function.
A data frame that contains an imputed_intensity column with the imputed values
and an imputed column indicating whether each value was imputed (TRUE) or not
(FALSE), in addition to any columns retained via retain_columns.
Elena Krismer
Stekhoven, D.J., & Bühlmann, P. (2012). MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. https://doi.org/10.1093/bioinformatics/btr597
set.seed(123) # Makes example reproducible
# Create example data
data <- create_synthetic_data(
n_proteins = 10,
frac_change = 0.5,
n_replicates = 4,
n_conditions = 2,
method = "effect_random",
additional_metadata = FALSE
)
head(data, n = 24)
# Perform imputation
data_imputed <- impute_randomforest(
data,
sample = sample,
grouping = peptide,
intensity_log2 = peptide_intensity_missing
)
head(data_imputed, n = 24)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.