ds.mice.pmm: Calculates imputations for univariate missing data by...

Description Usage Arguments Value Examples

Description

This function performs imputation by predictive mean matching by executing the pmmDS function on the server-side.

Usage

1
2
ds.mice.pmm(y = NULL, ry = NULL, x = NULL, wy = NULL, donors = 5,
  matchtype = 1L, ridge = 1e-05, checks = TRUE, datasources = NULL, ...)

Arguments

y

Vector to be imputed

ry

Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.

x

Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.

wy

Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.

donors

The size of the donor pool among which a draw is made. The default is donors = 5L. Setting donors = 1L always selects the closest match, but is not recommended. Values between 3L and 10L provide the best results in most cases (Morris et al, 2015).

matchtype

Type of matching distance. The default choice (matchtype = 1L) calculates the distance between the predicted value of yobs and the drawn values of ymis (called type-1 matching). Other choices are matchtype = 0L (distance between predicted values) and matchtype = 2L (distance between drawn values).

ridge

The ridge penalty used in .norm.draw() to prevent problems with multicollinearity. The default is ridge = 1e-05, which means that 0.01 percent of the diagonal is added to the cross-product. Larger ridges may result in more biased estimates. For highly noisy data (e.g. many junk variables), set ridge = 1e-06 or even lower to reduce bias. For highly collinear data, set ridge = 1e-04 or higher.

...

Other named arguments.

Value

Vector with imputed data, same type as y, and of length sum(wy)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# In this example, we assume that the Opal server to which we are connecting, 
# has a table that contains the 'boys' data from the original mice package.

# Load DataSHIELD libraries
library(dsBaseClient)
library(dsMiceClient)

# Build login information
server <- c("server_name")
url <- c("opal_url")
user <- "username"
password <- "password"
table <- c("project_name.table_name")
logindata <- data.frame(server,url,user,password,table)

# Login and assign the 'boys' dataset to varable 'D' on the server-side
opals <- datashield.login(logins=logindata, assign=TRUE)

datashield.assign(opals, symbol="xname", value=as.symbol("c('age', 'hgt', 'wgt')"))
datashield.assign(opals, symbol="r", value=as.symbol("complete.cases(D[, xname])"))
datashield.assign(opals, symbol="x", value=as.symbol("D[r, xname]"))
datashield.assign(opals, symbol="y", value=as.symbol("D[r, 'tv']"))
datashield.assign(opals, symbol="ry", value=as.symbol("notNaDS(y)"))

# Impute missing tv data
yimp <- ds.mice.pmm('y','ry','x')
length(yimp)
table(yimp)
hist(table(yimp), xlab = 'Imputed missing tv')

gflcampos/dsMiceClient documentation built on May 3, 2019, 4:33 p.m.