Description Usage Arguments Details Value Source See Also Examples
Detecting outlying observations for censored data, especially with an application of lifetime studies.
1 2 3 4 5 |
formula |
type of |
data |
data frame with variables used in the |
alg |
type of an outlier detection algorithm. Three algorithms are provided. The options |
reg |
type of a regression method used as a basis for outlier detection algorithms. The options |
fence |
type of an outlying fence. Three options are provided. The option |
kr |
numeric value to control the tightness of cut-offs for the residual algorithm with a default value of 3. |
kb |
numeric value to control the tightness of cut-offs for the boxplot algorithm with a default value of 1.5. |
h |
bandwidth for locally weighted censored quantile regression with a default value of 0.05. |
The odc
function conducts three outlier detection algorithms on the basis of censored quantile regression. Three outlier detection algorithms were implemented: residual-based, boxplot, and scoring algorithms. The residual-based algorithm detects outlying observations using constant scale estimates; however, it does not account for the heterogeneity of variability. When the data is extremely heterogeneous, the boxplot algorithm with censored quantile regression is more effective. The residual-based and boxplot algorithms produce cut-offs to determine whether observations are outliers. In contrast, the scoring algorithm provides the outlying magnitude or deviation of each point from the distribution of observations. Outlier detection is achieved by visualising the scores.
an object of the S4 class "OutlierDC" with the following slots:
call: evaluated function call
formula: formula to be used
raw.data: data to be used for model fitting
refined.data: the data set after removing outliers
refined.data: the data set containing outliers
coefficients: the estimated censored quantile regression coefficient matrix consisting of 10th, 25th, 50th, 75th, and 90th quantiles
fitted.mat: the censored quantile regression fitted value matrix consisting of 10th, 25th, 50th, 75th, and 90th quantiles
score: outlying scores (scoring algorithm) or residuals (residual-based algorithm)
cutoff: estimated scale parameter for the residual-based algorithm
lower: lower fence vector used for the boxplot and scoring algorithms
upper: upper fence vector used for the boxplot and scoring algorithms
outliers: logical vector to determine which observations are outliers
n.outliers: number of outliers detected
method: outlier detection method to be used
rq.model: censored quantile regression to be used
kr: a value to be used for the tightness of cut-offs in the residual algorithm
kb: a value to be used for the tightness of cut-offs in the boxplot algorithm
ks: a value to be used for the tightness of upper fence cut-offs used for the scoring algorithm with the update
function
fence: type of fence to be used in the model fitting
alpha: numeric value for the significance level
boot.dist: empirical quantiles by Jackknife-after-Bootstrapping
Eo, S-H, Hong, S-M, and Cho, H. (2014+). Identification of outlying observations with quantile regression for censored data, Submitted.
Martin, M. A., and Roberts, S. (2010). Jackknife-after-bootstrap regression influence diagnostics, Journal of Nonparametric Statistics, 22, 257-269.
Nardi, A., and Schemper, M. (1999). New residuals for Cox regression and their application to outlier screening, Biometrics, 55, 523-529.
Wang H. J., and Wang, L. (2009). Locally weighted censored quantile regression. Journal of the American Statistical Association, 104, 1117-1128.
OutlierDC-package
coef
, plot
, show
, update
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | ## Not run:
library(OutlierDC)
data(ebd)
str(ebd)
####
# outlier detection using the scoring algorithm
fit = odc(Surv(log(time), status) ~ meta, data = ebd)
fit
# A threshold is added by k_s to this plot using the updata() function
fit1 = update(fit, ks = 4)
fit1
plot(fit1)
# A threshold can be determined by using the empirical distribution of standard deviation for the outlying scores
fit2 = bodc(fit, B = 500)
fit2
####
# outlier detection using the residual-based algorithm
fit3 = odc(Surv(log(time), status) ~ meta, data = ebd, alg = "residual")
fit3
plot(fit3, main = "Residual-based algorithm")
fit3@outlier.data
####
# outlier detection using the boxplot-based algorithm
fit4 = odc(Surv(log(time), status) ~ meta, data = ebd, alg = "boxplot")
fit4
plot(fit4, main = "Boxplot-based algorithm", xlab = "Number of metastatic lymph nodes",, ylab = "Log of survival times")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.