rfsrc.anonymous: Anonymous Random Forests
In randomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

rfsrc.anonymous

R Documentation

Anonymous Random Forests

Description

Anonymous random forests is carefully modified to ensure that the original training data is not retained. This enables users to share the trained forest with others without disclosing the underlying data.

Usage

rfsrc.anonymous(formula, data, forest = TRUE, ...)

Arguments

`formula`	A symbolic description of the model to be fit. If missing, unsupervised splitting is performed.
`data`	A data frame containing the y-outcome and x-variables.
`forest`	Logical. Should the forest object be returned? Required for prediction on new data and by many other package functions.
`...`	Additional arguments passed to `rfsrc`. See the `rfsrc` help file for full details.

Details

This function calls rfsrc and returns a forest object with the original training data removed. This enables users to share their forest while preserving the privacy of their data.

To enable prediction on new (test) data, certain minimal information from the training data must still be retained. This includes:

Names of the original variables.
For factor variables, the levels of each factor.
Summary statistics used for imputation: the mean for continuous variables and the most frequent class for factors.
Tree topology, including split points used to grow the trees.

For maximal privacy, users are strongly encouraged to replace variable names with non-identifiable labels and convert all variables to continuous format when possible. If factor variables are used, their levels should also be anonymized. However, the user is solely responsible for de-identifying the data and verifying that privacy is maintained. We provide NO GUARANTEES regarding data confidentiality.

Missing data handling: Anonymous forests do not support imputation of training data. The option na.action = "na.impute" is automatically downgraded to "na.omit". If training data contain missing values, we recommend pre-imputing them using impute.

Test data, however, can be imputed at prediction time:

na.action = "na.impute" performs a fast imputation by replacing missing values with the training mean (for numeric variables) or most frequent class (for factors).
na.action = "na.random" uses a fast random draw from training distributions for imputation.

Although anonymous forests are compatible with many package functions, they are only guaranteed to work with functions that do not explicitly require access to the original training data.

Value

An object of class (rfsrc, grow, anonymous).

Author(s)

Hemant Ishwaran and Udaya B. Kogalur

Examples



## ------------------------------------------------------------
## regression
## ------------------------------------------------------------
print(rfsrc.anonymous(mpg ~ ., mtcars))

## ------------------------------------------------------------
## plot anonymous regression tree (using get.tree)
## TBD CURRENTLY NOT IMPLEMENTED 
## ------------------------------------------------------------
## plot(get.tree(rfsrc.anonymous(mpg ~ ., mtcars), 10))

## ------------------------------------------------------------
## classification
## ------------------------------------------------------------
print(rfsrc.anonymous(Species ~ ., iris))

## ------------------------------------------------------------
## survival
## ------------------------------------------------------------
data(veteran, package = "randomForestSRC")
print(rfsrc.anonymous(Surv(time, status) ~ ., data = veteran))

## ------------------------------------------------------------
## competing risks
## ------------------------------------------------------------
data(wihs, package = "randomForestSRC")
print(rfsrc.anonymous(Surv(time, status) ~ ., wihs, ntree = 100))

## ------------------------------------------------------------
## unsupervised forests
## ------------------------------------------------------------
print(rfsrc.anonymous(data = iris))

## ------------------------------------------------------------
## multivariate regression
## ------------------------------------------------------------
print(rfsrc.anonymous(Multivar(mpg, cyl) ~., data = mtcars))

## ------------------------------------------------------------
## prediction on test data with missing values using pbc data
## cases 1 to 312 have no missing values
## cases 313 to 418 having missing values
## ------------------------------------------------------------
data(pbc, package = "randomForestSRC")
pbc.obj <- rfsrc.anonymous(Surv(days, status) ~ ., pbc)
print(pbc.obj)

## mean value imputation
print(predict(pbc.obj, pbc[-(1:312),], na.action = "na.impute"))

## random imputation
print(predict(pbc.obj, pbc[-(1:312),], na.action = "na.random"))

## ------------------------------------------------------------
## train/test setting but tricky because factor labels differ over
## training and test data
## ------------------------------------------------------------

# first we convert all x-variables to factors
data(veteran, package = "randomForestSRC")
veteran.factor <- data.frame(lapply(veteran, factor))
veteran.factor$time <- veteran$time
veteran.factor$status <- veteran$status

# split the data into train/test data (25/75)
# the train/test data have the same levels, but different labels
train <- sample(1:nrow(veteran), round(nrow(veteran) * .5))
summary(veteran.factor[train, ])
summary(veteran.factor[-train, ])

# grow the forest on the training data and predict on the test data
v.grow <- rfsrc.anonymous(Surv(time, status) ~ ., veteran.factor[train, ]) 
v.pred <- predict(v.grow, veteran.factor[-train, ])
print(v.grow)
print(v.pred)

randomForestSRC documentation built on June 8, 2025, 1:12 p.m.

randomForestSRC index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

randomForestSRC
Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

rfsrc.anonymous: Anonymous Random Forests
In randomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

Anonymous Random Forests

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to rfsrc.anonymous in randomForestSRC...

R Package Documentation

Browse R Packages

We want your feedback!

randomForestSRC Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

rfsrc.anonymous: Anonymous Random Forests In randomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

Anonymous Random Forests

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to rfsrc.anonymous in randomForestSRC...

R Package Documentation

Browse R Packages

We want your feedback!

randomForestSRC
Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

rfsrc.anonymous: Anonymous Random Forests
In randomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)