knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(OutliersLearn);
The Outliers Learn R package allows users to learn how the outlier detection algorithms work.
In the following examples of use, most of these examples will always use the same dataset. This dataset is declared as inputData:
inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d")))); inputData = data.frame(inputData); print(inputData);
As it can be seen, this is a bidimensional matrix (data.frame) that has 7 rows. It can be seen more graphically like this:
plot(inputData);
With that being said, the following section will be dedicated to "how to execute" the auxiliary functions.
In this section, it will be shown how to call the auxiliary functions of the Outliers Learn R package. This includes:
euclidean_distance()
mahalanobis_distance()
manhattan_dist()
mean_outliersLearn()
sd_outliersLearn()
quantile_outliersLearn()
transform_to_vector()
First, the distance functions:
euclidean_distance()
)point1 = inputData[1,]; point2 = inputData[4,]; distance = euclidean_distance(point1, point2); print(distance);
mahalanobis_distance()
)inputDataMatrix = as.matrix(inputData); #Required conversion for this function sampleMeans = c(); #Calculate the mean for each column for(i in 1:ncol(inputDataMatrix)){ column = inputDataMatrix[,i]; calculatedMean = sum(column)/length(column); sampleMeans = c(sampleMeans, calculatedMean); } #Calculate the covariance matrix covariance_matrix = cov(inputDataMatrix); distance = mahalanobis_distance(inputDataMatrix[3,], sampleMeans, covariance_matrix); print(distance)
manhattan_dist()
)distance = manhattan_dist(c(1,2), c(3,4)); print(distance);
The statistical functions can be used like this:
mean_outliersLearn()
)mean = mean_outliersLearn(inputData[,1]); print(mean);
sd_outliersLearn()
)sd = sd_outliersLearn(inputData[,1], mean); print(sd);
quantile_outliersLearn()
)q = quantile_outliersLearn(c(12,2,3,4,1,13), 0.60); print(q);
Finally, the data-transforming function:
- Transform to vector (transform_to_vector()
)
numeric_data = c(1, 2, 3) character_data = c("a", "b", "c") logical_data = c(TRUE, FALSE, TRUE) factor_data = factor(c("A", "B", "A")) integer_data = as.integer(c(1, 2, 3)) complex_data = complex(real = c(1, 2, 3), imaginary = c(4, 5, 6)) list_data = list(1, "apple", TRUE) data_frame_data = data.frame(x = c(1, 2, 3), y = c("a", "b", "c")) transformed_numeric = transform_to_vector(numeric_data); print(transformed_numeric); transformed_character = transform_to_vector(character_data); print(transformed_character); transformed_logical = transform_to_vector(logical_data); print(transformed_logical); transformed_factor = transform_to_vector(factor_data); print(transformed_factor); transformed_integer = transform_to_vector(integer_data); print(transformed_integer); transformed_complex = transform_to_vector(complex_data); print(transformed_complex); transformed_list = transform_to_vector(list_data); print(transformed_list); transformed_data_frame = transform_to_vector(data_frame_data); print(transformed_data_frame);
Now that the auxiliary functions are understood, the main algorithms implemented for outlier detection will be detailed in the following section.
The main outlier detection methods implemented in the Outliers Learn package are:
box_and_whiskers()
DBSCAN_method()
knn()
lof()
mahalanobis_method()
z_score_method()
This section will be dedicated on showing how to use this algorithm implementations.
box_and_whiskers()
)With the tutorial mode deactivated and d=2:
boxandwhiskers(inputData,2,FALSE)
With the tutorial mode activated and d=2:
boxandwhiskers(inputData,2,TRUE)
DBSCAN_method()
)With the tutorial mode deactivated:
eps = 4; min_pts = 3; DBSCAN_method(inputData, eps, min_pts, FALSE);
With the tutorial mode activated:
eps = 4; min_pts = 3; DBSCAN_method(inputData, eps, min_pts, TRUE);
knn()
)With the tutorial mode deactivated, K=2 and d=3:
knn(inputData,3,2,FALSE)
With the tutorial mode activated, K=2 and d=3
knn(inputData,3,2,TRUE)
lof()
)With the tutorial mode deactivated, K=3 and the threshold set to 0.5:
lof(inputData, 3, 0.5, FALSE);
With the tutorial mode activated and same input parameters:
lof(inputData, 3, 0.5, TRUE);
mahalanobis_method()
)With the tutorial mode deactivated and alpha set to 0.7:
mahalanobis_method(inputData, 0.7, FALSE);
With the tutorial mode activated and same value of alpha:
mahalanobis_method(inputData, 0.7, TRUE);
z_score_method()
)With the tutorial mode deactivated and d set to 2:
z_score_method(inputData,2,FALSE);
With the tutorial mode activated and same value of d:
z_score_method(inputData,2,TRUE);
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.