datadist | R Documentation |
For a given set of variables or a data frame, determines summaries
of variables for effect and plotting ranges, values to adjust to,
and overall ranges
for Predict
, plot.Predict
, ggplot.Predict
,
summary.rms
, survplot
, and nomogram.rms
.
If datadist
is called before
a model fit and the resulting object pointed to with options(datadist="name")
,
the data characteristics will be stored with the fit by Design()
, so
that later predictions and summaries of the fit will not need to access
the original data used in the fit. Alternatively, you can specify the
values for each variable in the model when using these 3 functions, or
specify the values of some of them and let the functions look up the
remainder (of say adjustmemt levels) from an object created by datadist
.
The best method is probably to run datadist
once before any models are
fitted, storing the distribution summaries for all potential variables.
Adjustment values are 0
for binary variables, the most frequent
category (or optionally the first category level)
for categorical (factor
) variables, the middle level for
ordered factor
variables, and medians for continuous variables.
See descriptions of q.display
and q.effect
for how display and
effect ranges are chosen for continuous variables.
datadist(..., data, q.display, q.effect=c(0.25, 0.75),
adjto.cat=c('mode','first'), n.unique=10)
## S3 method for class 'datadist'
print(x, ...)
# options(datadist="dd")
# used by summary, plot, survplot, sometimes predict
# For dd substitute the name of the result of datadist
... |
a list of variable names, separated by commas, a single data frame, or
a fit with |
data |
a data frame or a search position. If |
q.display |
set of two quantiles for computing the range of continuous variables
to use in displaying regression relationships. Defaults are
|
q.effect |
set of two quantiles for computing the range of continuous variables to use in estimating regression effects. Defaults are c(.25,.75), which yields inter-quartile-range odds ratios, etc. |
adjto.cat |
default is |
n.unique |
variables having |
x |
result of |
For categorical variables, the 7 limits are set to character strings
(factors) which correspond to
c(NA,adjto.level,NA,1,k,1,k)
, where k
is the number of levels.
For ordered variables with numeric levels, the limits are set to
c(L,M,H,L,H,L,H)
, where L
is the lowest level, M
is the middle
level, and H
is the highest level.
a list of class "datadist"
with the following components
limits |
a |
values |
a named list, with one vector of unique values for each numeric
variable having no more than |
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
rms
, rms.trans
, describe
, Predict
, summary.rms
## Not run:
d <- datadist(data=1) # use all variables in search pos. 1
d <- datadist(x1, x2, x3)
page(d) # if your options(pager) leaves up a pop-up
# window, this is a useful guide in analyses
d <- datadist(data=2) # all variables in search pos. 2
d <- datadist(data=my.data.frame)
d <- datadist(my.data.frame) # same as previous. Run for all potential vars.
d <- datadist(x2, x3, data=my.data.frame) # combine variables
d <- datadist(x2, x3, q.effect=c(.1,.9), q.display=c(0,1))
# uses inter-decile range odds ratios,
# total range of variables for regression function plots
d <- datadist(d, z) # add a new variable to an existing datadist
options(datadist="d") #often a good idea, to store info with fit
f <- ols(y ~ x1*x2*x3)
options(datadist=NULL) #default at start of session
f <- ols(y ~ x1*x2)
d <- datadist(f) #info not stored in `f'
d$limits["Adjust to","x1"] <- .5 #reset adjustment level to .5
options(datadist="d")
f <- lrm(y ~ x1*x2, data=mydata)
d <- datadist(f, data=mydata)
options(datadist="d")
f <- lrm(y ~ x1*x2) #datadist not used - specify all values for
summary(f, x1=c(200,500,800), x2=c(1,3,5)) # obtaining predictions
plot(Predict(f, x1=200:800, x2=3)) # or ggplot()
# Change reference value to get a relative odds plot for a logistic model
d$limits$age[2] <- 30 # make 30 the reference value for age
# Could also do: d$limits["Adjust to","age"] <- 30
fit <- update(fit) # make new reference value take effect
plot(Predict(fit, age, ref.zero=TRUE, fun=exp),
ylab='Age=x:Age=30 Odds Ratio') # or ggplot()
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.