outliers: Analysing outliers in compositions.

Description Details Note Author(s) References See Also Examples

Description

The Philosophy behind outlier treatment in library(compositions).

Details

Outliers are omnipresent in all kinds of data analysis. To avoid catastrophic misinterpreations robust statistics has developed some methods to avoid the distracting influence of the outliers. The introduction of robust methods into the compositions package is described in robustnessInCompositions.

However sometimes we are interested directly in the analysis of outliers. The central philosophy of the the outlier classification subsystem in compositions is that outlier are in most cases not simply erroneous observations, but rather products of some systematic anomality. This can e.g. be an error in an individual component, a secondary process or a minor undetected but different subpopulation. The package provides various concepts to investigate possible reasons for outliers in compositional datasets.

With these tools we hope to provide a systematic approach to identify various types of outliers in a exploratory analysis.

Note

The package robustbase is required for using the robust estimations and the outlier subsystem of compositions. To simplify installation it is not listed as required, but it will be loaded, whenever any sort of outlierdetection or robust estimation is used.

Author(s)

K.Gerald v.d. Boogaart http://www.stat.boogaart.de

References

K. Gerald van den Boogaart, Raimon Tolosana-Delgado, Matevz-Bren (2009) Robustness, classification and visualization of outliers in compositional data, in prep.

See Also

compositions-package, missingsInCompositions, robustnessInCompositions, outliersInCompositions, outlierplot, OutlierClassifier1, ClusterFinder1

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
## Not run: 
# To slow
tmp<-set.seed(1400)
A <- matrix(c(0.1,0.2,0.3,0.1),nrow=2)
Mvar <- 0.1*ilrvar2clr(A%*%t(A))
Mcenter <- acomp(c(1,2,1))
typicalData <- rnorm.acomp(100,Mcenter,Mvar) # main population
colnames(typicalData)<-c("A","B","C")
data1 <- acomp(rnorm.acomp(100,Mcenter,Mvar))
data2 <- acomp(rbind(typicalData+rbinom(100,1,p=0.1)*rnorm(100)*acomp(c(4,1,1))))
data3 <- acomp(rbind(typicalData,acomp(c(0.5,1.5,2))))
colnames(data3)<-colnames(typicalData)
tmp<-set.seed(30)
rcauchy.acomp <- function (n, mean, var){
    D <- gsi.getD(mean)-1
    perturbe(ilrInv(matrix(rnorm(n*D)/rep(rnorm(n),D), ncol = D) %*% chol(clrvar2ilr(var))), mean)
}
data4 <- acomp(rcauchy.acomp(100,acomp(c(1,2,1)),Mvar/4))
colnames(data4)<-colnames(typicalData)
data5 <- acomp(rbind(unclass(typicalData)+outer(rbinom(100,1,p=0.1)*runif(100),c(0.1,1,2))))
data6 <- acomp(rbind(typicalData,rnorm.acomp(20,acomp(c(4,4,1)),Mvar)))
datas <- list(data1=data1,data2=data2,data3=data3,data4=data4,data5=data5,data6=data6)
tmp <-c()
opar<-par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1))  
tmp<-mapply(function(x,y) {
outlierplot(x,type="scatter",class.type="grade");
  title(y)
},datas,names(datas))


par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1))  
tmp<-mapply(function(x,y) {
  myCls2 <- OutlierClassifier1(x,alpha=0.05,type="all",corrected=TRUE)
  outlierplot(x,type="scatter",classifier=OutlierClassifier1,class.type="best",
  Legend=legend(1,1,levels(myCls),xjust=1,col=colcode,pch=pchcode),
  pch=as.numeric(myCls2));
  legend(0,1,legend=levels(myCls2),pch=1:length(levels(myCls2)))
  title(y)
},datas,names(datas))

par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1))  
for( i in 1:length(datas) ) 
  outlierplot(datas[[i]],type="ecdf",main=names(datas)[i])
par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1))  
for( i in 1:length(datas) ) 
  outlierplot(datas[[i]],type="portion",main=names(datas)[i])
par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1))  
for( i in 1:length(datas) ) 
  outlierplot(datas[[i]],type="nout",main=names(datas)[i])
par(opar)

moreData <- acomp(rbind(data3,data5,data6))
take<-OutlierClassifier1(moreData,type="grade")!="ok"
hc<-hclust(dist(normalize(acomp(scale(moreData)[take,]))),method="complete")
plot(hc)
plot(acomp(moreData[take,]),col=cutree(hc,1.5))

## End(Not run)

compositions documentation built on May 30, 2017, 3:25 a.m.