impute: Imputation of missing values
In protti: Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

impute

R Documentation

Imputation of missing values

Description

impute is calculating imputation values for missing data depending on the selected method.

Usage

impute(
  data,
  sample,
  grouping,
  intensity_log2,
  condition,
  comparison = comparison,
  missingness = missingness,
  noise = NULL,
  method = "ludovic",
  skip_log2_transform_error = FALSE,
  retain_columns = NULL
)

Arguments

`data`	a data frame that is ideally the output from the `assign_missingness` function. It should containing at least the input variables. For each "reference_vs_treatment" comparison, there should be the pair of the reference and treatment condition. That means the reference condition should be doublicated once for every treatment.
`sample`	a character column in the `data` data frame that contains the sample names.
`grouping`	a character column in the `data` data frame that contains the precursor or peptide identifiers.
`intensity_log2`	a numeric column in the `data` data frame that contains the intensity values.
`condition`	a character or numeric column in the `data` data frame that contains the the conditions.
`comparison`	a character column in the `data` data frame that contains the the comparisons of treatment/reference pairs. This is an output of the `assign_missingnes` function.
`missingness`	a character column in the `data` data frame that contains the missingness type of the data determines how values for imputation are sampled. This should at least contain `"MAR"` or `"MNAR"`. Missingness assigned as `NA` will not be imputed.
`noise`	a numeric column in the `data` data frame that contains the noise value for the precursor/peptide. Is only required if `method = "noise"`. Note: Noise values need to be log2 transformed.
`method`	a character value that specifies the method to be used for imputation. For `method = "ludovic"`, MNAR missingness is sampled from a normal distribution around a value that is three lower (log2) than the lowest intensity value recorded for the precursor/peptide and that has a spread of the mean standard deviation for the precursor/peptide. For `method = "noise"`, MNAR missingness is sampled from a normal distribution around the mean noise for the precursor/peptide and that has a spread of the mean standard deviation (from each condition) for the precursor/peptide. Both methods impute MAR data using the mean and variance of the condition with the missing data.
`skip_log2_transform_error`	a logical value that determines if a check is performed to validate that input values are log2 transformed. If input values are > 40 the test is failed and an error is returned.
`retain_columns`	a vector that indicates columns that should be retained from the input data frame. Default is not retaining additional columns `retain_columns = NULL`. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector).

Value

A data frame that contains an imputed_intensity and imputed column in addition to the required input columns. The imputed column indicates if a value was imputed. The imputed_intensity column contains imputed intensity values for previously missing intensities.

Examples

set.seed(123) # Makes example reproducible

# Create example data
data <- create_synthetic_data(
  n_proteins = 10,
  frac_change = 0.5,
  n_replicates = 4,
  n_conditions = 2,
  method = "effect_random",
  additional_metadata = FALSE
)

head(data, n = 24)

# Assign missingness information
data_missing <- assign_missingness(
  data,
  sample = sample,
  condition = condition,
  grouping = peptide,
  intensity = peptide_intensity_missing,
  ref_condition = "all",
  retain_columns = c(protein, peptide_intensity)
)

head(data_missing, n = 24)

# Perform imputation
data_imputed <- impute(
  data_missing,
  sample = sample,
  grouping = peptide,
  intensity_log2 = peptide_intensity_missing,
  condition = condition,
  comparison = comparison,
  missingness = missingness,
  method = "ludovic",
  retain_columns = c(protein, peptide_intensity)
)

head(data_imputed, n = 24)

protti documentation built on Oct. 22, 2024, 1:06 a.m.