do_reduction: Reduce missing data

View source: R/do_reduction.R

do_reductionR Documentation

Reduce missing data

Description

Drop rows from a data set until the number of pairwise defined columns is at least equal to a minimum acceptable number for every combination of two remaining rows. Given a selection of two rows, a column is pairwise defined if its elements are defined for both rows (i.e. neither row has NA for that column).

Usage

do_reduction(fr, n = 15, keep = "", report = FALSE)

Arguments

fr

A data frame.

n

Minimum acceptable number of pairwise defined columns in the reduced data frame. Values can range from 1 to the number of columns. (Default = 15.)

keep

A row to be kept from elimination provided that it has enough defined (i.e. non-NA) elements to satisfy the minimum acceptable number condition.

report

Whether to report which rows are being dropped.

Details

Reduction is achieved by dropping rows that cause the number of pairwise defined columns to fail the minimum acceptable number condition. The process has two stages. In the first stage, all rows with fewer than the minimum acceptable number of defined elements are dropped as they cannot satisfy the condition when in combination with another row. The second stage is iterative. At each step, rows associated with the least number of pairwise defined counts are identified and one is dropped. (A row that has been nominated to be kept will not be dropped at this stage.) The row elimination process continues until the least number of pairwise counts reaches the minimum acceptable number.

This code incorporates suggestions by Bill Venables (see Jan 2014 archive at https://list.science.auckland.ac.nz/sympa/arc/stat-rdownunder).

Value

A data frame where the number of pairwise defined columns is at least equal to the minimum acceptable number for every combination of two rows.


tjfinney/ANTTV documentation built on July 1, 2024, 11 p.m.