rhat: Semi-supervised correlation estimation

Description Usage Arguments Details Value See Also

Description

This function estimates the correlation between an outcome available only for a small subset of the data and a covariate. The outcome is imputed to all the data using a smoothed predictor learned thanks to a set of surrogate variables, available for all the data.

Usage

1
2
3
rhat(data, nn, outcome_name = NULL, covariate_name = NULL,
  surrogate_name = NULL, bw = NULL, cdf_trans = TRUE, weights = NULL,
  adjust_covariates_name = NULL, do_interact = TRUE)

Arguments

data

the data. The first nn rows should be the labeled data, the remaining rows should be the unlabeled data.

nn

the number of labeled data

outcome_name

a character string containing the name of the column from data containing the partly missing outcome of interest

covariate_name

a character string containing the name of the column from data containing the covariate to be related to the outcome of interest

surrogate_name

a character string vector containing the name of the column(s) from data containing the surrogate variable(s)

bw

the bandwidth to use

cdf_trans

a logical flag indicating wether the smoothing should be performed on the data transformed with their cdf. Default is TRUE. See Details.

weights

a weighting vector of length nn in case a weighted version of the correlation has to be computed. Default is NULL, in which case, no weighting is done.

adjust_covariates_name

optional vector of names of the covariates to adjust on during imputation and smoothing. Default is NULL

do_interact

logical flag indicating whether interactins between x and covariates should be taken into account when imputing y. Default is TRUE.

Details

Smoothing over the CDF transformed data prevents some tail estimation issues when the new data are subsequently large.

Value

a list with the following elements:

See Also

smooth_ssl


stepcie/sslcov documentation built on May 30, 2019, 2:39 p.m.