ml_survreg: Accelerated Failure Time (AFT) Survival Regression Model

Description Usage Arguments Value Note See Also Examples

View source: R/ml_regression.R

Description

ml_survreg fits an accelerated failure time (AFT) survival regression model on a spark_tbl. Users can call summary to get a summary of the fitted AFT model, predict to make predictions on new data, and write_ml/read_ml to save/load fitted models.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
ml_survreg(
  data,
  formula,
  aggregationDepth = 2,
  stringIndexerOrderType = c("frequencyDesc", "frequencyAsc", "alphabetDesc",
    "alphabetAsc")
)

## S4 method for signature 'AFTSurvivalRegressionModel'
summary(object)

## S4 method for signature 'AFTSurvivalRegressionModel'
predict(object, newData)

## S4 method for signature 'AFTSurvivalRegressionModel,character'
write_ml(object, path, overwrite = FALSE)

Arguments

data

a spark_tbl for training.

formula

a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', ':', '+', and '-'. Note that operator '.' is not supported currently.

aggregationDepth

The depth for treeAggregate (greater than or equal to 2). If the dimensions of features or the number of partitions are large, this param could be adjusted to a larger size. This is an expert parameter. Default value should be good for most cases.

stringIndexerOrderType

how to order categories of a string feature column. This is used to decide the base level of a string feature as the last category after ordering is dropped when encoding strings. Supported options are "frequencyDesc", "frequencyAsc", "alphabetDesc", and "alphabetAsc". The default value is "frequencyDesc". When the ordering is set to "alphabetDesc", this drops the same category as R when encoding strings.

object

a fitted AFT survival regression model.

newData

a spark_tbl for testing.

path

the directory where the model is saved.

overwrite

overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

...

additional arguments passed to the method.

Value

survreg returns a fitted AFT survival regression model.

summary returns summary information of the fitted model, which is a list. The list includes the model's coefficients (features, coefficients, intercept and log(scale)).

predict returns a spark_tbl containing predicted values on the original scale of the data (mean predicted value at scale = 1.0).

Note

spark.survreg since 2.0.0

summary(AFTSurvivalRegressionModel) since 2.0.0

predict(AFTSurvivalRegressionModel) since 2.0.0

write_ml(AFTSurvivalRegressionModel, character) since 2.0.0

See Also

ml_survival: https://cran.r-project.org/package=survival

write_ml

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Not run: 
df <- spark_tbl(ovarian)
model <- ml_survreg(df, Surv(futime, fustat) ~ ecog_ps + rx)

# get a summary of the model
summary(model)

# make predictions
predicted <- predict(model, df)
show(predicted)

# save and load the model
path <- "path/to/model"
write_ml(model, path)

savedModel <- read_ml(path)
summary(savedModel)

## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.