odc: Outlier detection using quantile regression for censored data

Description Usage Arguments Details Value Source See Also Examples

Description

outlier detection algorithms using quantile regression for censored data

Usage

1
2
3
4
     odc(formula, data, 
          method = c("score", "boxplot","residual"), 
          rq.model = c("Wang", "PengHuang", "Portnoy"), 
          k_r = 1.5, k_b =1.5, h = .05)

Arguments

formula

a type of Formula object with a survival object on the left-hand side of the ~ operator and covariate terms on the right-hand side. The survival object with survival time and its censoring status is constructed by the Surv function in survival package.

data

a data frame with variables used in the formula. It needs at least three variables, including survival time, censoring status, and covariates.

method

the outlier detection method to be used. The options "socre", "boxplot", and "residual" conduct the scoring, boxplot, and residual-based algorithm, respectively. The default algorithm is "score".

rq.model

the type of censored quantile regression to be used for fitting. The options "Wang", "Portnoy", and "PengHuang" conduct Wang and Wang's, Portnoy's, and Peng and Huang's censored quantile regression approaches, respectively. The default is "Wang".

k_r

a value to control the tightness of cut-offs for the residual algorithm with a default value of 1.5.

k_b

a value to control the tightness of cut-offs for the boxplot algorithm with a default value of 1.5.

h

bandwidth for locally weighted censored quantile regression with a default value of 0.05.

Details

The odc function conducts three outlier detection algorithms on the basis of censored quantile regression. Three outlier detection algorithms were implemented: residual-based, boxplot, and scoring algorithms. The residual-based algorithm detects outlying observations using constant scale estimates; however, it does not account for the heterogeneity of variability. When the data is extremely heterogeneous, the boxplot algorithm with censored quantile regression is more effective. The residual-based and boxplot algorithms produce cut-offs to determine whether observations are outliers. In contrast, the scoring algorithm provides the outlying magnitude or deviation of each point from the distribution of observations. Outlier detection is achieved by visualising the scores.

Value

an object of the S4 class "OutlierDC" with the following slots:
call: evaluated function call
formula: formula to be used
raw.data: data to be used for model fitting
refined.data: the data set after removing outliers
refined.data: the data set containing outliers
coefficients: the estimated censored quantile regression coefficient matrix consisting of 10th, 25th, 50th, 75th, and 90th quantiles
fitted.mat: the censored quantile regression fitted value matrix consisting of 10th, 25th, 50th, 75th, and 90th quantiles
score: outlying scores (scoring algorithm) or residuals (residual-based algorithm)
cutoff: estimated scale parameter for the residual-based algorithm
lower: lower fence vector used for the boxplot and scoring algorithms
upper: upper fence vector used for the boxplot and scoring algorithms
outliers: logical vector to determine which observations are outliers
n.outliers: number of outliers detected
method: outlier detection method to be used
rq.model: censored quantile regression to be used
k_r: a value to be used for the tightness of cut-offs in the residual algorithm
k_b: a value to be used for the tightness of cut-offs in the boxplot algorithm
k_s: a value to be used for the tightness of upper fence cut-offs used for the scoring algorithm with the update function

Source

Eo S-H, Hong S-M Hong, Cho H (2014). Identification of outlying observations with quantile regression for censored data, Submitted.

Wang HJ, Wang L (2009) Locally Weighted Censored Quantile Regression. JASA 104:1117–1128. doi: 10.1198/jasa.2009.tm08230

See Also

OutlierDC-package
coef, plot, show, update

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
  ## Not run: 
    require(OutlierDC)
    # Toy example 
    data(ebd)
    # The data consists of 402 observations with 6 variables. 
    dim(ebd)
    # To show the first six observations of the dataset,
    head(ebd)
    
    #scoring algorithm
    fit <- odc(Surv(log(time), status) ~ meta, data = ebd)
    fit
    coef(fit)
    plot(fit)

    # Add upper bound for the selection of outleirs
    fit1 <- update(fit, k_s = 4)
    fit1
    plot(fit1)

    # residual-based algorithm
    fit2 <- odc(Surv(log(time), status) ~ meta, data = ebd, method = "residual", k_r = 1.5)
    fit2
    plot(fit2)
    
    # To display all of outlying observations in the fitted object
    fit2@outlier.data
    
    # boxplot algorithm
    fit3 <- odc(Surv(log(time), status) ~ meta, data = ebd, method = "boxplot", k_b = 1.5)
    fit3
    plot(fit3, ylab = "log survival times", xlab = "metastasis lymph nodes")

## End(Not run)

Example output

Loading required package: survival
Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve


Attaching package: 'quantreg'

The following object is masked from 'package:survival':

    untangle.specials

Loading required package: Formula

Package OutlierDC (0.3-0) loaded.
[1] 402   6
           id meta exam status time     ratio
1787 55468952    0   12      1   26 0.0000000
1788  8883016    0   12      1   11 0.0000000
1789 10647194    0   12      0  134 0.0000000
1790 16033679    2   12      1    1 0.1666667
1791 19519884    0   12      0  111 0.0000000
1792 19574077    0   12      1    8 0.0000000
Please wait... 
Done. 

     Outlier Detection for Censored Data

 Call: odc(formula = Surv(log(time), status) ~ meta, data = ebd)
 Algorithm: Scoring algorithm (score) 
 Model: Locally weighted censored quantile regression (Wang) 
 Value for cut-off k_s:   
 # of outliers detected:  0 

 Top 6 outlying scores:
    times delta (Intercept) meta score Outlier
346  4.48     0           1    9  4.59        
327  2.71     1           1   13  4.54        
326  2.08     1           1   14  2.52        
296  4.86     1           1    4  2.35        
354  3.09     1           1   10  2.11        
233  5.29     0           1    1  1.95        
               q10    q25    q50    q75    q90
(Intercept)  1.632  2.565  3.401  4.500  5.196
meta        -0.022 -0.077 -0.111 -0.183 -0.191

     Outlier Detection for Censored Data

 Call: odc(formula = Surv(log(time), status) ~ meta, data = ebd)
 Algorithm: Scoring algorithm (score) 
 Model: Locally weighted censored quantile regression (Wang) 
 Value for cut-off k_s:  4 
 # of outliers detected:  2 

 Top 6 outlying scores:
    times delta (Intercept) meta score Outlier
346  4.48     0           1    9  4.59       *
327  2.71     1           1   13  4.54       *
326  2.08     1           1   14  2.52        
296  4.86     1           1    4  2.35        
354  3.09     1           1   10  2.11        
233  5.29     0           1    1  1.95        
Please wait... 
Done. 

     Outlier Detection for Censored Data

 Call: odc(formula = Surv(log(time), status) ~ meta, data = ebd, method = "residual", 
    k_r = 1.5)
 Algorithm: Residual-based algorithm (residual) 
 Model: Locally weighted censored quantile regression (Wang) 
 Value for cut-off k_r:  1.5 
 # of outliers detected:  9 

 Outliers detected:
    times delta (Intercept) meta residual sigma Outlier
57   4.80     0           1    2     1.63   1.6       *
80   5.04     1           1    0     1.64   1.6       *
189  5.38     0           1    0     1.98   1.6       *
191  5.20     0           1    0     1.80   1.6       *
233  5.29     0           1    1     2.00   1.6       *
296  4.86     1           1    4     1.90   1.6       *

 6 of all 9 outliers were displayed. 
          id meta exam status time      ratio
57  39165334    2   13      0  122 0.15384615
80   2022934    0   13      1  154 0.00000000
189 25678892    0   16      0  217 0.00000000
191 10521031    0   17      0  181 0.00000000
233 52223267    1   18      0  198 0.05555556
296 27085350    4   20      1  129 0.20000000
346 12269804    9   24      0   88 0.37500000
357 17822095    0   25      1  157 0.00000000
395 43506173    0   37      0  152 0.00000000
Please wait... 
Done. 

     Outlier Detection for Censored Data

 Call: odc(formula = Surv(log(time), status) ~ meta, data = ebd, method = "boxplot", 
    k_b = 1.5)
 Algorithm: Boxplot algorithm (boxplot) 
 Model: Locally weighted censored quantile regression (Wang) 
 Value for cut-off k_b:  1.5 
 # of outliers detected:  1 

 Outliers detected:
    times delta (Intercept) meta   UB Outlier
346  4.48     0           1    9 4.32       *

 1 of all 1 outliers were displayed. 

OutlierDC documentation built on May 1, 2019, 11:31 p.m.