Description Usage Arguments Details Value Author(s) See Also Examples
Function genoutlier
finds and excludes outlied (concentration) values according to selected method and draws plot of outliers.
1 2 3 4 |
x |
a vector of concentration values or data frame of genasis/openair type. See 'Details' for more detailed description of both data types. |
y |
a vector of measurement dates in the case of vector input only. |
input |
a type of data.frame in the case of data.frame input. The allowed values are "openair" (default) and "genasis". In case of vector input, this argument is meaningless. |
output |
a type of output data.frame. As in the |
method |
method of threshold(s) determination. Allowed values are |
sides |
if |
pollutant |
a name(s) of the pollutant(s), for which the outliers are find. Not necessary if only data for one pollutant is available in |
plot |
logical. Indicates, whether plot should be plotted. |
columns |
number of columns in the multi-plot arrangement. |
col.points |
color of non-outlied points inside the plot. |
pch |
plotting 'character', i.e., symbol to use. For more details see points. |
xlab |
the x label of the plot. |
ylab |
the y label of the plot. |
main |
overall title for the plot. |
The genoutlier
function finds outlied (concentration) values according to a criterion given by arguments method
and sides
and substitutes them by NAs. The function recognises three different input formats: Option input="openair"
uses "openair" format of data frame with first column of name "date"
and class "Date"
, optional columns of names "date_end"
, "temp"
, "wind"
and "note"
and other columns of class "numeric"
containing concentration values and named by names of the compounds. input="genasis"
is used for the data frame with six columns "valu"
, "comp"
, "date_start"
, "date_end"
, "temp"
and "wind"
where the first, fifth and sixth are of class "numeric"
, second of class "character"
and third and fourth columns could be both "character"
or "Date"
class. The names of columns in input="genasis"
are not rigid, only their order is assumed. There is also a possibility to specify x
and y
as two vectors of equal lenght, first of class "numeric"
containing concentration values, second of class "character"
or "Date"
containing measurement dates.
The output
argument specifies of which type the resul will be. Both types of "data.frame"
class output="openair"
and output="genasis"
are available, the default value is equal to the input
argument, therefore the vector class of output is possible only if x
is of class "numeric"
and output
is not specified.
There are seven available methods of outlier threshold set up: method="m3s"
set the lower threshold equal to sample mean - 3 standard deviations and the uuper threshold to the sample mean + 3 standard deviations. Variant method="m2s"
works similarly with only doubled standard deviations. In case of log-normally distributed data, the variant method="lm3s"
could work better, setting up the lower threshold as geometric mean / 3 geometric standard deviation and the upper threshold as geometric mean * 3 geometric standard deviation. Analogously method="lm2s"
works with the doubled geometric standard deviation. Non-parametric variants "iqr2"
, "iqr4"
and "iqr7"
set lower threshold to 25th quantile - a * interquartile range and upper threshold to 75th quantile + a * interquartile range with parameter a sequentially 0.5, 1.5 and 3 (thus the whole range is 2, 4 and 7 times the interquartile range).
The argument sides
serves to specification, whether the one-sided or two-sided exclusion of outliers will be done. In the case sides=2
(default), both outliers under the lower and over the upper threshold are excluded, conversely if sides=1
, only the outliers over the upper threshold are excluded.
a list containing:
res |
the data frame (or vector) according to the |
lower |
numeric value of lower threshold |
upper |
numeric value of upper threshold |
Jiri Kalina
kalina@mail.muni.cz
genloq, genhistogram, genpastoact, genanaggr,
genplot, genstatistic, gentransform, genwhisker
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ## Definition of simple data sources:
c1<-rnorm(100)+12
c2<-"random compound"
c3<-as.Date(as.Date("2013-01-01"):as.Date("2013-04-10"),
origin="1970-01-01")
c4<-c3+1
sample_genasis<-data.frame(c1,c2,c3,c4)
sample_openair<-data.frame(c4,c1)
colnames(sample_openair)=c("date",c2)
## Examples of different usages:
genoutlier(sample_openair,input="openair",pollutant="random compound",
method="m2s")
genoutlier(sample_genasis,input="genasis",method="m3s")
## Use of example data from the package:
data(kosetice.pas.openair)
genoutlier(genpastoact(kosetice.pas.openair[,1:8]),method="lm3s",
main="Outliers",ylab="Concentration ngm-3")
genoutlier(kosetice.pas.openair[,c(1:4,23:26)],col.points="orange",
method="lm3s")
data(kosetice.pas.genasis)
genoutlier(kosetice.pas.genasis[625:832,],input="genasis",
method="lm2s",sides=1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.