metadata_sparsity: metadata_sparsity

View source: R/metadata_sparsity.R

metadata_sparsityR Documentation

metadata_sparsity

Description

Checks the number of missing values in each sample. This function can be used to check how sparse the metadata is. In human studies, it is easy to have sparse metadata which inadvertently gives subsets of samples simply based on the amount of available information. This function tallies the number of NA in each sample and returns subsets of samples based on the number of missing values they have.

Usage

metadata_sparsity(met_df)

Arguments

met_df

metadata dataframe with samples in rows and metadata variables in columns

Value

list. na_tally is the first item in the list. It is a tally of the number of samples with a given number of missing metadata variables. subsequent items in the list are the subset of samples according to the number of NA metadata variables they have.

Examples

set.seed(1)
metadata_example <- data.frame(
   sampleID = LETTERS[1:10],
   group = c(rep(1:2, each = 3), rep(3, 4)),
   age = c(rnorm(6, 30, 5), rep(NA, 4)),
   sex = c(rep('F', 3), rep(NA, 4), rep('M', 3)),
   ethnicity = sample(c(NA,1,2,3), 10, replace = TRUE),
   medication = sample(c(NA,1,2), 10, replace=TRUE))

met_sparse <- metadata_sparsity(metadata_example)
summary(met_sparse)
met_sparse$na_tally
met_sparse[[3]]
met_sparse[[4]]

OxfordCMS/OCMSutility documentation built on Feb. 27, 2025, 8:19 p.m.