impute_na | R Documentation |
This function imputes missing values using a user-specified imputation method.
impute_na(
df,
method = "minProb",
tune_sigma = 1,
q = 0.01,
maxiter = 10,
ntree = 20,
n_pcs = 2,
seed = NULL
)
df |
A |
method |
Imputation method to use. Default is |
tune_sigma |
A scalar used in the |
q |
A scalar used in |
maxiter |
Maximum number of iterations to be performed when using the
|
ntree |
Number of trees to grow in each forest when using the
|
n_pcs |
Number of principal components to calculate when using the
|
seed |
Numerical. Random number seed. Default is |
Ideally, you should first remove proteins with
high levels of missing data using the filterbygroup_na
function
before running impute_na
on the raw_df
object or the
norm_df
object.
impute_na
function imputes missing values using a
user-specified imputation method from the available options, minProb
,
minDet
, kNN
, RF
, and SVD
.
Note: Some imputation methods may require that the data be normalized prior to imputation.
Make sure to fix the random number seed with seed
for reproducibility
.
An imp_df
object, which is a data frame of protein intensities
with no missing values.
Chathurani Ranathunge
Lazar, Cosmin, et al. "Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies." Journal of proteome research 15.4 (2016): 1116-1125.
More information on the available imputation methods can be found in their respective packages.
create_df
For minProb
and
minDet
methods, see
imputeLCMD
package.
For Random Forest (RF
) method, see
missForest
.
For kNN
method, see kNN
from the
VIM
package.
For SVD
method, see pca
from the
pcaMethods
package.
## Generate a raw_df object with default settings. No technical replicates.
raw_df <- create_df(
prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"
)
## Impute missing values in the data frame using the default minProb
## method.
imp_df1 <- impute_na(raw_df, seed = 3312)
## Impute using the RF method with the number of iterations set at 5
## and number of trees set at 100.
imp_df2 <- impute_na(raw_df,
method = "RF",
maxiter = 5, ntree = 100,
seed = 3312
)
## Using the kNN method.
imp_df3 <- impute_na(raw_df, method = "kNN", seed = 3312)
## Using the SVD method with n_pcs set to 3.
imp_df4 <- impute_na(raw_df, method = "SVD", n_pcs = 3, seed = 3312)
## Using the minDet method with q set at 0.001.
imp_df5 <- impute_na(raw_df, method = "minDet", q = 0.001, seed = 3312)
## Impute a normalized data set using the kNN method
imp_df6 <- impute_na(ecoli_norm_df, method = "kNN")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.