fill_NA_N: 'fill_NA_N' function for the multiple imputations purpose

View source: R/fill_NA_N.R

fill_NA_NR Documentation

fill_NA_N function for the multiple imputations purpose

Description

Multiple imputations to fill the missing data. Non missing independent variables are used to approximate a missing observations for a dependent variable. Quantitative models were built under Rcpp packages and the C++ library Armadillo.

Usage

fill_NA_N(
  x,
  model,
  posit_y,
  posit_x,
  w = NULL,
  logreg = FALSE,
  k = 10,
  ridge = 1e-06
)

## S3 method for class 'data.frame'
fill_NA_N(
  x,
  model,
  posit_y,
  posit_x,
  w = NULL,
  logreg = FALSE,
  k = 10,
  ridge = 1e-06
)

## S3 method for class 'data.table'
fill_NA_N(
  x,
  model,
  posit_y,
  posit_x,
  w = NULL,
  logreg = FALSE,
  k = 10,
  ridge = 1e-06
)

## S3 method for class 'matrix'
fill_NA_N(
  x,
  model,
  posit_y,
  posit_x,
  w = NULL,
  logreg = FALSE,
  k = 10,
  ridge = 1e-06
)

Arguments

x

a numeric matrix or data.frame/data.table (factor/character/numeric/logical) - variables

model

a character - possible options ("lm_bayes","lm_noise","pmm")

posit_y

an integer/character - a position/name of dependent variable

posit_x

an integer/character vector - positions/names of independent variables

w

a numeric vector - a weighting variable - only positive values, Default: NULL

logreg

a boolean - if dependent variable has log-normal distribution (numeric). If TRUE log-regression is evaluated and then returned exponential of results., Default: FALSE

k

an integer - a number of multiple imputations or for pmm a number of closest points from which a one random value is taken, Default:10

ridge

a numeric - a value added to diagonal elements of the x'x matrix, Default: 1e-6

Value

load imputations in a numeric/character/factor (similar to the input type) vector format

Methods (by class)

  • fill_NA_N(data.frame): s3 method for data.frame

  • fill_NA_N(data.table): S3 method for data.table

  • fill_NA_N(matrix): S3 method for matrix

Note

It is assumed that users add the intercept column themselves. The miceFast module provides the most efficient environment; the second recommended option is data.table with a numeric matrix. Only "lm_bayes", "lm_noise", and "pmm" models are supported. The model is fitted only when the number of complete observations exceeds the number of independent variables.

See Also

fill_NA VIF vignette("miceFast-intro", package = "miceFast")

Examples

library(miceFast)
library(dplyr)
library(data.table)

data(air_miss)

# dplyr: PMM with 20 draws
air_miss %>%
  mutate(Ozone_pmm = fill_NA_N(
    x = ., model = "pmm",
    posit_y = "Ozone", posit_x = c("Solar.R", "Wind", "Temp"),
    k = 20
  ))

# dplyr: lm_noise with weights
air_miss %>%
  mutate(Ozone_imp = fill_NA_N(
    x = ., model = "lm_noise",
    posit_y = "Ozone",
    posit_x = c("Solar.R", "Wind", "Temp"),
    w = .[["weights"]],
    logreg = TRUE, k = 30
  ))

# data.table: PMM grouped
data(air_miss)
setDT(air_miss)
air_miss[, Ozone_pmm := fill_NA_N(
  x = .SD, model = "pmm",
  posit_y = "Ozone",
  posit_x = c("Wind", "Temp", "Intercept"),
  k = 20
), by = .(groups)]

# See the vignette for full examples:
# vignette("miceFast-intro", package = "miceFast")


miceFast documentation built on Feb. 26, 2026, 5:06 p.m.