sample_na_loc: Sample Missing-Value Locations with Constraints

View source: R/tune_imp.R

sample_na_locR Documentation

Sample Missing-Value Locations with Constraints

Description

Sample matrix indices for NA injection while respecting row and column missingness limits and avoiding zero-variance columns.

Usage

sample_na_loc(
  obj,
  n_cols = NULL,
  n_rows = 2L,
  num_na = NULL,
  n_reps = 1L,
  rowmax = 0.9,
  colmax = 0.9,
  na_col_subset = NULL,
  max_attempts = 100
)

Arguments

obj

A numeric matrix.

n_cols

Integer or NULL. Number of columns to receive injected missing values. Must be supplied when num_na = NULL.

n_rows

Integer. Target number of missing values to inject per selected column.

num_na

Integer or NULL. Total number of missing values to inject per repetition. If supplied, n_cols is derived from num_na and n_rows, and missing values are distributed as evenly as possible across columns.

n_reps

Integer. Number of independent repetitions.

rowmax

Numeric scalar between 0 and 1. Maximum allowed missing-data proportion per row after injection.

colmax

Numeric scalar between 0 and 1. Maximum allowed missing-data proportion per column after injection.

na_col_subset

Optional integer or character vector restricting which columns are eligible for missing-value injection.

max_attempts

Integer. Maximum number of resampling attempts per repetition before giving up.

Details

The function uses a greedy stochastic search for valid NA locations. It ensures that:

  • Total missingness per row and column does not exceed rowmax and colmax.

  • At least two distinct observed values are preserved in every affected column.

Value

A list of length n_reps. Each element is a two-column integer matrix with row and column indices for sampled NA locations.

Examples

set.seed(123)
mat <- matrix(runif(100), nrow = 10)

# Sample 5 missing values across 5 columns
locs <- sample_na_loc(mat, n_cols = 5, n_rows = 1)
locs

# Inject the missing values from the first repetition
mat[locs[[1]]] <- NA
mat


slideimp documentation built on June 17, 2026, 1:08 a.m.