distributedKparty: K-Party Vertical Distributed Regression Analysis

Description Usage Arguments Value See Also Examples

Description

AnalysisCenter.KParty and DataPartner.KParty are used in conjuction with PopMedNet to perform linear, logistic, or cox regression on data that has been partitioned vertically between two or more data partners. The data partners which holds the data use DataPartner.KParty while a trusted "third" party uses AnalysisCenter.KParty. Data partners are allowed to communicate with each other and the analysis center, no information is shared between the data partners or analysis center that would allow one data partner or the analysis center to reconstrut part of the other data partners data. Final coefficients and other regression statistics are computed by the analysis center and shared with the data partners.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
AnalysisCenter.KParty(regression = "linear", numDataPartners = NULL,
                      monitorFolder = NULL, msreqid = "v_default_00_000",
                      tol = 1e-8, maxIterations = 25, sleepTime = 10,
                      maxWaitingTime = 86400, popmednet = TRUE,
                      trace = FALSE, verbose = TRUE)

DataPartner.KParty(regression = "linear", data = NULL, response = NULL,
                   strata = NULL, mask = TRUE, numDataPartners = NULL,
                   dataPartnerID = NULL, monitorFolder = NULL,
                   sleepTime = 10, maxWaitingTime = 86400, popmednet = TRUE,
                   trace = FALSE, verbose = TRUE)

Arguments

regression

the model to be used to fit the data. The default regression "linear" fits a least squares linear model to the data. Alternatively, "logistic" returns a fitted logistic model, and "cox" returns a fitted Cox proportional hazards model.

data

a data.frame or matrix which contains the data to be used in the model. All columns will be used as covariates in the regression with the exception of the data partner which has dataPartnerID = 1. For this data partner, all columns, with the expection of the column specified by response, will be used as covariates in the regression.

response

only used for data parther with dataPartnerID = 1. For "linear" and "logistic" regression, the name of the column in data which holds the response variable. If reponse = NULL, then the first column of data will be used as the response variable. For "cox" regression response hold the name of the column which is time to event and the name of the column which is the event type (0 = censored, 1 = event). If response = NULL, then the first column of data is assumed to be the time to even and the second column is assumed to be the event type.

strata

for "cox" regression only. A vector of character strings identifying the names of the covariates from either party which will be used as strata. All data partners must specify the same vector of strata.

mask

logical value: If FALSE, strata levels for the strata which belong to the party which specified FALSE will be identified by name. If TRUE, levels for the strata which belong to the party which specified TRUE will be put in a random order and level names will be changed to NA.

numDataPartners

the number of data partners which are supplying data for the regression.

dataPartnerID

a unique identifier for each data partner. The data partner with the response variable(s) must have dataPartnerID = 1. All other data partners must have an integer value from 2 to numDataPartners.

monitorFolder

the folder where the directories dplocal, inputfiles, macros, msoc, and rprograms are located.

msreqid

a character string specifying the name of the Request ID as specified when creating the Distributed Regresion request on PopMedNet. Used for logging purposes only.

tol

the tolerance used to determine convergence in "logistic" and "cox" regression.

maxIterations

the maximum number of iterations to perform "logistic" or "cox" regression before non-convergence is declared.

sleepTime

the number of seconds to wait after writing the last file to disk before signalling the PMN Datamart Client that files are ready to be transferred.

maxWaitingTime

the number of seconds to wait to receive files before a transfer error is declared and the program halts execution. Should be the same for all parties when delayOffset = TRUE.

popmednet

logical value: if TRUE, assumes that PopMednet is being used to transfer the files and implements PopMedNet specific routines. In particular, a 15 second offset between terminiation of routines that execute in parallel is implemented.

trace

logical value: if TRUE and verbose == TRUE, prints every function call. Used for debugging.

verbose

logical value. If TRUE, prints out information to document the progression of the computation.

Value

Returns an object of class vdralinear for linear regression, vdralogistic for logistic regression, or vdracox for cox regression.

See Also

AnalysisCenter.2Party, AnalysisCenter.KParty

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
## Not run: 
## 3 party linear regression

# Analysis Center -- To be run in one instance of R.
# The working directory should be the same as specified in the PopMedNet
# requset for the analysis center.
fit = AnalysisCenter.KParty(regression = "linear", numDataPartners = 2,
              monitorFolder = tempdir())

# Data Partner 1 -- To be run in second instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner.KParty(regression = "linear", data = vdra_data[, c(1, 5:7)],
          response = "Change_BMI", numDataPartners = 2, dataPartnerID = 1,
          monitorFolder = tempdir())

# Data Partner 2 -- To be run in third instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner.KParty(regression = "linear", data = vdra_data[, 8:11],
          numDataPartners = 2, dataPartnerID = 2, monitorFolder = tempdir())

## 3 party logistic regression

# Analysis Center -- To be run in one instance of R.
# The working directory should be the same as specified in the PopMedNet
# requset for the analysis center.
fit = AnalysisCenter.KParty(regression = "logistic", numDataPartners = 2,
              monitorFolder = tempdir())

# Data Partner 1 -- To be run in second instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner.KParty(regression = "logistic", data = vdra_data[, c(2, 5:7)],
          response = "WtLost", numDataPartners = 2, dataPartnerID = 1,
          monitorFolder = tempdir())

# Data Partner 2 -- To be run in third instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner.KParty(regression = "logistic", data = vdra_data[, 8:11],
          numDataPartners = 2, dataPartnerID = 2, monitorFolder = tempdir())

## 3 party cox regression

# Analysis Center -- To be run in one instance of R.
# The working directory should be the same as specified in the PopMedNet
# requset for the analysis center.
fit = AnalysisCenter.KParty(regression = "cox", numDataPartners = 2,
              monitorFolder = tempdir())

# Data Partner 1 -- To be run in second instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner.KParty(regression = "cox", data = vdra_data[, c(3:4, 5:7)],
        response = c("Time", "Status"), strata = c("Exposure", "Sex"),
        numDataPartners = 2, dataPartnerID = 1, monitorFolder = tempdir())

# Data Partner 2 -- To be run in third instand of R, on perhaps a different machine.
# The working directory should be the same as specified in the PopMedNet
# request for the data partner.
fit = DataPartner.KParty(regression = "cox", data = vdra_data[, 8:11],
         strata = c("Exposure", "Sex"), numDataPartners = 2, dataPartnerID = 2,
         monitorFolder = tempdir())

## End(Not run)

vdra documentation built on Sept. 9, 2021, 9:10 a.m.