replace_missing: Replace missing values ('NA's)

View source: R/metapipe.R

replace_missingR Documentation

Replace missing values (NAs)

Description

Replace missing values (NAs) in a dataset, the user can choose between two actions to handle missing data:

  1. Drop traits (variables) that exceed a given threshold, prop_na, a rate of missing (NA) and total observations.

  2. Replace missing values by half of the minimum within each trait.

Finally, if there are traits for which all entries are missing, these will be removed from the dataset and stored in a external CSV file called "<out_prefix>_NA_raw_data.csv".

Usage

replace_missing(
  raw_data,
  excluded_columns = NULL,
  out_prefix = "metapipe",
  prop_na = 0.5,
  replace_na = FALSE
)

Arguments

raw_data

Data frame containing the raw data.

excluded_columns

Numeric vector containing the indices of the dataset properties that are non-numeric, excluded columns.

out_prefix

Prefix for output files and plots.

prop_na

Proportion of missing/total observations, if a trait exceeds this threshold and replace_na = FALSE, then it will be dropped out.

replace_na

Boolean flag to indicate whether or not missing values should be replaced by half of the minimum value within each trait.

Value

Data frame containing the raw data without missing values.

Examples

                                        
# Toy dataset                                        
example_data <- data.frame(ID = c(1,2,3,4,5), 
                           P1 = c("one", "two", "three", "four", "five"), 
                           T1 = rnorm(5), 
                           T2 = rnorm(5),
                           T3 = c(NA, rnorm(4)),                  #  20 % NAs
                           T4 = c(NA, 1.2, -0.5, NA, 0.87),       #  40 % NAs
                           T5 = NA)                               # 100 % NAs
MetaPipe::replace_missing(example_data, c(1, 2))
MetaPipe::replace_missing(example_data, c(1, 2), prop_na =  0.25)
MetaPipe::replace_missing(example_data, c(1, 2), replace_na =  TRUE)


# F1 Seedling Ionomics dataset
data(ionomics) # Includes some missing data
ionomics_rev <- MetaPipe::replace_missing(ionomics, c(1, 2))
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2), 
                                          prop_na =  0.025)
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2),
                                          replace_na =  TRUE)
knitr::kable(ionomics_rev[1:5, 1:8])

# Clean up example outputs
MetaPipe:::tidy_up("metapipe_")

villegar/MetaPipe documentation built on Nov. 22, 2022, 10:44 p.m.