impute_na: Impute missing values
In promor: Proteomics Data Analysis and Modeling Tools

impute_na

R Documentation

Impute missing values

Description

This function imputes missing values using a user-specified imputation method.

Usage

impute_na(
  df,
  method = "minProb",
  tune_sigma = 1,
  q = 0.01,
  maxiter = 10,
  ntree = 20,
  n_pcs = 2,
  seed = NULL
)

Arguments

`df`	A `raw_df` object (output of `create_df`) containing missing values or a `norm_df` object after performing normalization.
`method`	Imputation method to use. Default is `"minProb"`. Available methods: `"minDet", "RF", "kNN", and "SVD"`.
`tune_sigma`	A scalar used in the `"minProb"` method for controlling the standard deviation of the Gaussian distribution from which random values are drawn for imputation. Default is 1.
`q`	A scalar used in `"minProb"` and `"minDet"` methods to obtain a low intensity value for imputation. `q` should be set to a very low value. Default is 0.01.
`maxiter`	Maximum number of iterations to be performed when using the `"RF"` method. Default is `10`.
`ntree`	Number of trees to grow in each forest when using the `"RF"` method. Default is `20`.
`n_pcs`	Number of principal components to calculate when using the `"SVD"` method. Default is 2.
`seed`	Numerical. Random number seed. Default is `NULL`

Details

Ideally, you should first remove proteins with high levels of missing data using the filterbygroup_na function before running impute_na on the raw_df object or the norm_df object.
impute_na function imputes missing values using a user-specified imputation method from the available options, minProb, minDet, kNN, RF, and SVD.
Note: Some imputation methods may require that the data be normalized prior to imputation.
Make sure to fix the random number seed with seed for reproducibility

Value

An imp_df object, which is a data frame of protein intensities with no missing values.

Author(s)

Chathurani Ranathunge

References

Lazar, Cosmin, et al. "Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies." Journal of proteome research 15.4 (2016): 1116-1125.

Examples

## Generate a raw_df object with default settings. No technical replicates.
raw_df <- create_df(
  prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
  exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"
)

## Impute missing values in the data frame using the default minProb
## method.
imp_df1 <- impute_na(raw_df, seed = 3312)


## Impute using the RF method with the number of iterations set at 5
## and number of trees set at 100.
imp_df2 <- impute_na(raw_df,
  method = "RF",
  maxiter = 5, ntree = 100,
  seed = 3312
)


## Using the kNN method.
imp_df3 <- impute_na(raw_df, method = "kNN", seed = 3312)



## Using the SVD method with n_pcs set to 3.
imp_df4 <- impute_na(raw_df, method = "SVD", n_pcs = 3, seed = 3312)

## Using the minDet method with q set at 0.001.
imp_df5 <- impute_na(raw_df, method = "minDet", q = 0.001, seed = 3312)

## Impute a normalized data set using the kNN method
imp_df6 <- impute_na(ecoli_norm_df, method = "kNN")

promor documentation built on July 26, 2023, 5:39 p.m.