distributed2party: Two Party Vertical Distributed Regression Analysis

Description Usage Arguments Value See Also Examples

Description

AnalysisCenter.2Party and DataPartner.2Party are used in conjuction with PopMedNet to perform linear, logistic, or cox regression on data that has been partitioned vertically between two data partners. The data partner which holds the response variable(s) uses AnalysisCener.2Party and the other data partner uses DataPartner.2Party. While both data partners share information with each other in order to perform the regression, data is kept secure and not shared, nor is any information shared that would allow one data partner to reconstruct part of the other data partners data. Final coefficients and other regression statistics are computed by the analysis center and shared with the other data partner.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
AnalysisCenter.2Party(regression = "linear", data = NULL, response = NULL,
                      strata = NULL, mask = TRUE, monitorFolder = NULL,
                      msreqid = "v_default_00_000", blocksize = 500,
                      tol = 1e-8, maxIterations = 25, sleepTime = 10,
                      maxWaitingTime = 86400, popmednet = TRUE,
                      trace = FALSE, verbose = TRUE)

DataPartner.2Party(regression = "linear", data = NULL, strata = NULL,
                   mask = TRUE, monitorFolder = NULL, sleepTime = 10,
                   maxWaitingTime = 86400, popmednet = TRUE,
                   trace = FALSE, verbose = TRUE)

Arguments

regression

the model to be used to fit the data. The default regression "linear" fits a least squares linear model to the data. Alternatively, "logistic" returns a fitted logistic model, and "cox" returns a fitted Cox proportional hazards model.

data

a data.frame or matrix which contains the data to be used in the model. For DataPartner.2Party(), all columns will be used as covariates in the regression. For AnalysisCenter.2Party(), all columns, with the expection of the column specified by response, will be used as covariates in the regression.

response

for "linear" and "logistic" regression, the name of the column in data which holds the response variable. If reponse = NULL, then the first column of data will be used as the response variable. For "cox" regression response hold the name of the column which is time to event and the name of the column which is the event type (0 = censored, 1 = event). If response = NULL, then the first column of data is assumed to be the time to even and the second column is assumed to be the event type.

strata

for "cox" regression only. A vector of character strings identifying the names of the covariates from either party which will be used as strata. Both AnalysisCenter.2party and DataPartner.2Party must specify the same vector of strata.

mask

logical value: If FALSE, strata levels for the strata which belong to the party which specified FALSE will be identified by name. If TRUE, levels for the strata which belong to the party which specified TRUE will be put in a random order and level names will be changed to NA.

monitorFolder

the folder where the directories dplocal, inputfiles, macros, msoc, and rprograms are located.

msreqid

a character string specifying the name of the Request ID as specified when creating the Distributed Regresion request on PopMedNet. Used for logging purposes only.

blocksize

the minimium size used to horizontally partition the data for data transfer between the two parties.

tol

the tolerance used to determine convergence in "logistic" and "cox" regression.

maxIterations

the maximum number of iterations to perform "logistic" or "cox" regression before non-convergence is declared.

sleepTime

the number of seconds to wait after writing the last file to disk before signalling the PMN Datamart Client that files are ready to be transferred.

maxWaitingTime

the number of seconds to wait to receive files before a transfer error is declared and the program halts execution.

popmednet

logical value: if TRUE, assumes that PopMednet is being used to transfer the files and implements PopMedNet specific routines. In particular, a 15 second offset terminiation of routines that execute in parallel is implemented.

trace

logical value: if TRUE and verbose == TRUE, prints every function called during execution. Used for debugging.

verbose

logical value. If TRUE, prints out information to document the progression of the computation.

Value

Returns an object of class vdralinear for linear regression, vdralogistic for logistic regression, or vdracox for cox regression.

See Also

AnalysisCenter.3Party, AnalysisCenter.KParty

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
## Not run: 
## 2 party linear regression

# Analysis Center -- To be run in one instance of R.
# The working directory should be the same as specified in the PopMedNet
# requset for the analysis center.

fit = AnalysisCenter.2Party(regression = "linear", data = vdra_data[, c(1, 5:7)],
        response = "Change_BMI", monitorFolder = tempdir())

# Data Partner -- To be run in second instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.

fit = DataPartner.2Party(regression = "linear", data = vdra_data[, 8:11],
        monitorFolder = tempdir())

## 2 party logistic regression

# Analysis Center -- To be run in one instance of R.
# The working directory should be the same as specified in the PopMedNet
# requset for the analysis center.

fit = AnalysisCenter.2Party(regression = "logistic", data = vdra_data[, c(2, 5:7)],
        response = "WtLost", monitorFolder = tempdir())

# Data Partner -- To be run in second instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.

fit = DataPartner.2Party(regression = "logistic", data = vdra_data[, 8:11],
        monitorFolder = tempdir())

## 2 party cox regression

# Analysis Center -- To be run in one instance of R.
# The working directory should be the same as specified in the PopMedNet
# requset for the analysis center.

fit = AnalysisCenter.2Party(regression = "cox", data = vdra_data[, c(3:4, 5:7)],
        response = c("Time", "Status"), strata = c("Exposure", "Sex"),
        monitorFolder = tempdir())

# Data Partner -- To be run in second instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.

fit = DataPartner.2Party(regression = "cox", data = vdra_data[, 8:11],
        strata = c("Exposure", "Sex"), monitorFolder = tempdir())

## End(Not run)

vdra documentation built on Sept. 9, 2021, 9:10 a.m.