distributed3party: Three Party Vertical Distributed Regression Analysis

Description Usage Arguments Value See Also Examples

Description

AnalysisCenter.3Party, DataPartner1.3Party and DataPartner2.3Party are used in conjuction with PopMedNet to perform linear, logistic, or cox regression on data that has been partitioned vertically between two data partners. The data partner which holds the response variable(s) uses Datapartner1.3Party and the other data partner uses DataPartner2.3Party. Data partners are not allowed to communicate with each other, but share inforamtion via a trusted third party analysis center. While any infomration that is shared with the analysis center by a data partner, with the exception of some summary statistics, is encrypted by the sending data parter, if the infomration needs to be sent on to the other data parter for futher analysis, the analysis center further encrypts the data. That way, any information that deals directly with the raw data that moves between two data partners is doubly encrypted to keep both the analysis center and the other data partner from learning it. Thus, no information is shared between the data partners or analysis center that would allow one data partner to reconstrut part of the other data partners data. Final coefficients and other regression statistics are computed by the analysis center and shared with the data partners.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
AnalysisCenter.3Party(regression = "linear", monitorFolder = NULL,
                      msreqid = "v_default_00_000", blocksize = 500,
                      tol = 1e-8, maxIterations = 25, sleepTime = 10,
                      maxWaitingTime = 86400, popmednet = TRUE,
                      trace = FALSE, verbose = TRUE)

DataPartner1.3Party(regression = "linear", data = NULL, response = NULL,
                    strata = NULL, mask = TRUE, monitorFolder = NULL,
                    sleepTime = 10, maxWaitingTime = 86400, popmednet = TRUE,
                    trace = FALSE, verbose = TRUE)

DataPartner2.3Party(regression = "linear", data = NULL, strata = NULL,
                    mask = TRUE, monitorFolder = NULL, sleepTime = 10,
                    maxWaitingTime = 86400, popmednet = TRUE,
                    trace = FALSE, verbose = TRUE)

Arguments

regression

the model to be used to fit the data. The default regression "linear" fits a least squares linear model to the data. Alternatively, "logistic" returns a fitted logistic model, and "cox" returns a fitted Cox proportional hazards model.

data

a data.frame or matrix which contains the data to be used in the model. For DataPartner2.3Party(), all columns will be used as covariates in the regression. For DataPartner1.3Party(), all columns, with the expection of the column specified by response, will be used as covariates in the regression.

response

for "linear" and "logistic" regression, the name of the column in data which holds the response variable. If reponse = NULL, then the first column of data will be used as the response variable. For "cox" regression response hold the name of the column which is time to event and the name of the column which is the event type (0 = censored, 1 = event). If response = NULL, then the first column of data is assumed to be the time to even and the second column is assumed to be the event type.

strata

for "cox" regression only. A vector of character strings identifying the names of the covariates from either party which will be used as strata. Both DataPartner1.3party and DataPartner2.3Party must specify the same vector of strata.

mask

logical value: If FALSE, strata levels for the strata which belong to the party which specified FALSE will be identified by name. If TRUE, levels for the strata which belong to the party which specified TRUE will be put in a random order and level names will be changed to NA.

monitorFolder

the folder where the directories dplocal, inputfiles, macros, msoc, and rprograms are located.

msreqid

a character string specifying the name of the Request ID as specified when creating the Distributed Regresion request on PopMedNet. Used for logging purposes only.

blocksize

the minimium size used to horizontally partition the data for data transfer between the two parties.

tol

the tolerance used to determine convergence in "logistic" and "cox" regression.

maxIterations

the maximum number of iterations to perform "logistic" or "cox" regression before non-convergence is declared.

sleepTime

the number of seconds to wait after writing the last file to disk before signalling the PMN Datamart Client that files are ready to be transferred.

maxWaitingTime

the number of seconds to wait to receive files before a transfer error is declared and the program halts execution. Should be the same for both parties when delayOffset = TRUE.

popmednet

logical value: if TRUE, assumes that PopMednet is being used to transfer the files and implements PopMedNet specific routines. In particular, a 15 second offset between terminiation of routines that execute in parallel is implemented.

trace

logical value: if TRUE and verbose == TRUE, prints every function call. Used for debugging.

verbose

logical value. If TRUE, prints out information to document the progression of the computation.

Value

Returns an object of class vdralinear for linear regression, vdralogistic for logistic regression, or vdracox for cox regression.

See Also

AnalysisCenter.2Party, AnalysisCenter.KParty

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
## Not run: 
## 3 party linear regression

# Analysis Center -- To be run in one instance of R.
# The working directory should be the same as specified in the PopMedNet
# requset for the analysis center.
fit = AnalysisCenter.3Party(regression = "linear", monitorFolder = tempdir())

# Data Partner 1 -- To be run in second instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner1.3Party(regression = "linear", data = vdra_data[, c(1, 5:7)],
          response = "Change_BMI", monitorFolder = tempdir())

# Data Partner 2 -- To be run in third instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner2.3Party(regression = "linear", data = vdra_data[, 8:11],
          monitorFolder = tempdir())

## 3 party logistic regression

# Analysis Center -- To be run in one instance of R.
# The working directory should be the same as specified in the PopMedNet
# requset for the analysis center.
fit = AnalysisCenter.3Party(regression = "logistic", monitorFolder = tempdir())

# Data Partner 1 -- To be run in second instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner1.3Party(regression = "logistic", data = vdra_data[, c(2, 5:7)],
          response = "WtLost", monitorFolder = tempdir())

# Data Partner 2 -- To be run in third instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner2.3Party(regression = "logistic", data = vdra_data[, 8:11],
          monitorFolder = tempdir())

## 3 party cox regression

# Analysis Center -- To be run in one instance of R.
# The working directory should be the same as specified in the PopMedNet
# requset for the analysis center.
fit = AnalysisCenter.3Party(regression = "cox", monitorFolder = tempdir())

# Data Partner 1 -- To be run in second instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner1.3Party(regression = "cox", data = vdra_data[, c(3:4, 5:7)],
        response = c("Time", "Status"), strata = c("Exposure", "Sex"),
        monitorFolder = tempdir())

# Data Partner 2 -- To be run in third instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner2.3Party(regression = "cox", data = vdra_data[, 8:11],
         strata = c("Exposure", "Sex"), monitorFolder = tempdir())

## End(Not run)

vdra documentation built on Sept. 9, 2021, 9:10 a.m.