Description Usage Arguments Details Value Author(s) Examples
To carry out a search partition analysis (SPAN)
1 2 |
formula |
A formula of the standard form |
data |
A data frame with the variables in the formula. |
weight |
A frequency weight attached to each row of data. Default, NA, indicates unit weight to each data row. |
cc |
Indicates complete case analysis (default FALSE). If TRUE, a row of data is deleted if any one attribute is missing. Otherwise a case is only deleted if any attribute is missing in a Boolean combination, as evaluated during a search. Default FALSE |
makepos |
If TRUE, and an attribute is found to be negative, the direction of x is reversed.
The rule for reversal is if mean of y|x=1 < mean of y|x=0. When |
beta |
Parameter controlling degree of complexity penalising. Zero for no complexity penalising. NA (default) or negative determines a value for beta automatically as 0.03 times the initial gradient of the compleity hull. |
size |
Defines the upper allowable size parameters of a disjunctive normal form used in the initial iteration of a search.
It is a list of length q defining p_1,p_2,..p_q. Default |
gamma |
Parameter controlling balance of observations in A and its complement !A. Default is NA, corresponds to no balancing. Balancing multiplies either MSE reduction or log-rank by (P_A(1-P_A))^γ where P_A is proportion of data in A to make a new optimization criterion. |
A function to search for an optimal Boolean combination partition. Optimization is with respect to
reduction in mean square error of y
by split into partition (A,!A), or if y
is a survival object, with respect to log-rank chi-square for survival differences of (A,!A).
The Boolean expression for A is output in normal disjunctive form A= g_1 | g_2 | g_3 | ... and
the Boolean expression for the complement !A is also output in normal disjunctive form
!A = h_1 | h_2 | h_3 | .... Each element of the disjunctive forms, g_i of A, or h_i of !A, of the
represents a subgroup. Subgroups are returned data frames.
If variables x, u, v, w....
of the formula are not coded binary, a pre-analysis is done to establish
an optimal cut of the variable. This is done, again with respect to reduction in MSE, or log-rank for a survival formula,
over values of the variable. If numeric,
a dictotomy is made by above/below a cut, the possible cuts being unique values of the variable if there are 20 or fewer,
otherwise at 20 equally spaced intervals. If factor variable, according each value of the factor.
Object spanr
with attributes:
A
Data frame of same length as input data that is a binary indicator of belonging to A.
g
Data frame of same length as input data, columns indicating belonging to the
subgroups of A
h
Data frame of same length as input data, columns indicating belonging to the
subgroups of !A
Roger Marshall <rj.marshall@auckland.ac.nz>, The University of Auckland, New Zealand
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ## 1. Simulate Bernoulli binary predictors x1, x2...x10, and outcome y
## For (x1 x2 x3) | (x1 x4) | (x1 x9), make y~N(11,0.5) and N(10,0.5) otherwise.
x <- matrix(data=rbinom(10000,1,0.5),nrow=1000,ncol=10)
colnames(x) <- paste("x", seq(1:10), sep = "")
P <- ifelse((x[,1]& x[,2] & x[,3])|(x[,1] & x[,4])|x[,9] & x[,1], 1,0)
y <- ifelse(P,rnorm(1000,11,0.5),rnorm(1000,10,0.5) )
d <- data.frame(cbind(y,x))
sp <- spanr(formula= y ~ x1 +x2+x3+x4+x5+x6+x7+x8+x9+x10,data=d,size=c(1,2,2),beta=NA)
## 2. Survival analysis of pbc data
library(survival)
data(pbc)
sp <-with(pbc, spanr(formula = Surv(time, status==2) ~ trt + age + sex + ascites
+ hepato + spiders + edema + bili + chol + albumin
+ copper + ast + trig + platelet + protime + stage,
beta=NA,cc=TRUE,gamma=1) )
test <- cbind(pbc,sp$A)
##Kaplan-Meier curves of A versus !A
x <- survfit(Surv(test$time,test$status==2) ~ test$A)
plot(x, col=c(1,2))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.