Description Usage Arguments Details Value Source See Also Examples
outlier detection algorithms using quantile regression for censored data
1 2 3 4 |
formula |
a type of |
data |
a data frame with variables used in the |
method |
the outlier detection method to be used. The options |
rq.model |
the type of censored quantile regression to be used for fitting. The options |
k_r |
a value to control the tightness of cut-offs for the residual algorithm with a default value of 1.5. |
k_b |
a value to control the tightness of cut-offs for the boxplot algorithm with a default value of 1.5. |
h |
bandwidth for locally weighted censored quantile regression with a default value of 0.05. |
The odc
function conducts three outlier detection algorithms on the basis of censored quantile regression. Three outlier detection algorithms were implemented: residual-based, boxplot, and scoring algorithms. The residual-based algorithm detects outlying observations using constant scale estimates; however, it does not account for the heterogeneity of variability. When the data is extremely heterogeneous, the boxplot algorithm with censored quantile regression is more effective. The residual-based and boxplot algorithms produce cut-offs to determine whether observations are outliers. In contrast, the scoring algorithm provides the outlying magnitude or deviation of each point from the distribution of observations. Outlier detection is achieved by visualising the scores.
an object of the S4 class "OutlierDC" with the following slots:
call: evaluated function call
formula: formula to be used
raw.data: data to be used for model fitting
refined.data: the data set after removing outliers
refined.data: the data set containing outliers
coefficients: the estimated censored quantile regression coefficient matrix consisting of 10th, 25th, 50th, 75th, and 90th quantiles
fitted.mat: the censored quantile regression fitted value matrix consisting of 10th, 25th, 50th, 75th, and 90th quantiles
score: outlying scores (scoring algorithm) or residuals (residual-based algorithm)
cutoff: estimated scale parameter for the residual-based algorithm
lower: lower fence vector used for the boxplot and scoring algorithms
upper: upper fence vector used for the boxplot and scoring algorithms
outliers: logical vector to determine which observations are outliers
n.outliers: number of outliers detected
method: outlier detection method to be used
rq.model: censored quantile regression to be used
k_r: a value to be used for the tightness of cut-offs in the residual algorithm
k_b: a value to be used for the tightness of cut-offs in the boxplot algorithm
k_s: a value to be used for the tightness of upper fence cut-offs used for the scoring algorithm with the update
function
Eo S-H, Hong S-M Hong, Cho H (2014). Identification of outlying observations with quantile regression for censored data, Submitted.
Wang HJ, Wang L (2009) Locally Weighted Censored Quantile Regression. JASA 104:1117–1128. doi: 10.1198/jasa.2009.tm08230
OutlierDC-package
coef
, plot
, show
, update
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | ## Not run:
require(OutlierDC)
# Toy example
data(ebd)
# The data consists of 402 observations with 6 variables.
dim(ebd)
# To show the first six observations of the dataset,
head(ebd)
#scoring algorithm
fit <- odc(Surv(log(time), status) ~ meta, data = ebd)
fit
coef(fit)
plot(fit)
# Add upper bound for the selection of outleirs
fit1 <- update(fit, k_s = 4)
fit1
plot(fit1)
# residual-based algorithm
fit2 <- odc(Surv(log(time), status) ~ meta, data = ebd, method = "residual", k_r = 1.5)
fit2
plot(fit2)
# To display all of outlying observations in the fitted object
fit2@outlier.data
# boxplot algorithm
fit3 <- odc(Surv(log(time), status) ~ meta, data = ebd, method = "boxplot", k_b = 1.5)
fit3
plot(fit3, ylab = "log survival times", xlab = "metastasis lymph nodes")
## End(Not run)
|
Loading required package: survival
Loading required package: quantreg
Loading required package: SparseM
Attaching package: 'SparseM'
The following object is masked from 'package:base':
backsolve
Attaching package: 'quantreg'
The following object is masked from 'package:survival':
untangle.specials
Loading required package: Formula
Package OutlierDC (0.3-0) loaded.
[1] 402 6
id meta exam status time ratio
1787 55468952 0 12 1 26 0.0000000
1788 8883016 0 12 1 11 0.0000000
1789 10647194 0 12 0 134 0.0000000
1790 16033679 2 12 1 1 0.1666667
1791 19519884 0 12 0 111 0.0000000
1792 19574077 0 12 1 8 0.0000000
Please wait...
Done.
Outlier Detection for Censored Data
Call: odc(formula = Surv(log(time), status) ~ meta, data = ebd)
Algorithm: Scoring algorithm (score)
Model: Locally weighted censored quantile regression (Wang)
Value for cut-off k_s:
# of outliers detected: 0
Top 6 outlying scores:
times delta (Intercept) meta score Outlier
346 4.48 0 1 9 4.59
327 2.71 1 1 13 4.54
326 2.08 1 1 14 2.52
296 4.86 1 1 4 2.35
354 3.09 1 1 10 2.11
233 5.29 0 1 1 1.95
q10 q25 q50 q75 q90
(Intercept) 1.632 2.565 3.401 4.500 5.196
meta -0.022 -0.077 -0.111 -0.183 -0.191
Outlier Detection for Censored Data
Call: odc(formula = Surv(log(time), status) ~ meta, data = ebd)
Algorithm: Scoring algorithm (score)
Model: Locally weighted censored quantile regression (Wang)
Value for cut-off k_s: 4
# of outliers detected: 2
Top 6 outlying scores:
times delta (Intercept) meta score Outlier
346 4.48 0 1 9 4.59 *
327 2.71 1 1 13 4.54 *
326 2.08 1 1 14 2.52
296 4.86 1 1 4 2.35
354 3.09 1 1 10 2.11
233 5.29 0 1 1 1.95
Please wait...
Done.
Outlier Detection for Censored Data
Call: odc(formula = Surv(log(time), status) ~ meta, data = ebd, method = "residual",
k_r = 1.5)
Algorithm: Residual-based algorithm (residual)
Model: Locally weighted censored quantile regression (Wang)
Value for cut-off k_r: 1.5
# of outliers detected: 9
Outliers detected:
times delta (Intercept) meta residual sigma Outlier
57 4.80 0 1 2 1.63 1.6 *
80 5.04 1 1 0 1.64 1.6 *
189 5.38 0 1 0 1.98 1.6 *
191 5.20 0 1 0 1.80 1.6 *
233 5.29 0 1 1 2.00 1.6 *
296 4.86 1 1 4 1.90 1.6 *
6 of all 9 outliers were displayed.
id meta exam status time ratio
57 39165334 2 13 0 122 0.15384615
80 2022934 0 13 1 154 0.00000000
189 25678892 0 16 0 217 0.00000000
191 10521031 0 17 0 181 0.00000000
233 52223267 1 18 0 198 0.05555556
296 27085350 4 20 1 129 0.20000000
346 12269804 9 24 0 88 0.37500000
357 17822095 0 25 1 157 0.00000000
395 43506173 0 37 0 152 0.00000000
Please wait...
Done.
Outlier Detection for Censored Data
Call: odc(formula = Surv(log(time), status) ~ meta, data = ebd, method = "boxplot",
k_b = 1.5)
Algorithm: Boxplot algorithm (boxplot)
Model: Locally weighted censored quantile regression (Wang)
Value for cut-off k_b: 1.5
# of outliers detected: 1
Outliers detected:
times delta (Intercept) meta UB Outlier
346 4.48 0 1 9 4.32 *
1 of all 1 outliers were displayed.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.