qkdbscan: qKernel-DBSCAN density reachability and connectivity...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Similiar to the Density-Based Spatial Clustering of Applications with Noise(or DBSCAN) algorithm, qKernel-DBSCAN is a density-based clustering algorithm that can be applied under both linear and non-linear situations.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## S4 method for signature 'matrix'
qkdbscan(x, kernel = "rbfbase", qpar = list(sigma = 0.1, q = 0.9),
eps = 0.25, MinPts = 5, hybrid = TRUE, seeds = TRUE,  showplot  = FALSE,
countmode = NULL, na.action = na.omit, ...)

## S4 method for signature 'cndkernmatrix'
qkdbscan(x, eps = 0.25, MinPts = 5, seeds = TRUE,
showplot  = FALSE, countmode = NULL, ...)

## S4 method for signature 'qkernmatrix'
qkdbscan(x, eps = 0.25, MinPts = 5, seeds = TRUE,
showplot  = FALSE, countmode = NULL, ...)

## S4 method for signature 'qkdbscan'
predict(object, data, newdata = NULL, predict.max = 1000, ...)

Arguments

x

the data matrix indexed by row, or a kernel matrix of cndkernmatrix or qkernmatrix.

kernel

the kernel function used in training and predicting. This parameter can be set to any function, of class kernel, which computes a kernel function value between two vector arguments. qkerntool provides the most popular kernel functions which can be used by setting the kernel parameter to the following strings:

  • rbfbase Radial Basis qkernel function "Gaussian"

  • nonlbase Non Linear qkernel function

  • laplbase Laplbase qkernel function

  • ratibase Rational Quadratic qkernel function

  • multbase Multiquadric qkernel function

  • invbase Inverse Multiquadric qkernel function

  • wavbase Wave qkernel function

  • powbase Power qkernel function

  • logbase Log qkernel function

  • caubase Cauchy qkernel function

  • chibase Chi-Square qkernel function

  • studbase Generalized T-Student qkernel function

  • nonlcnd Non Linear cndkernel function

  • polycnd Polynomial cndkernel function

  • rbfcnd Radial Basis cndkernel function "Gaussian"

  • laplcnd Laplacian cndkernel function

  • anocnd ANOVA cndkernel function

  • raticnd Rational Quadratic cndkernel function

  • multcnd Multiquadric cndkernel function

  • invcnd Inverse Multiquadric cndkernel function

  • wavcnd Wave cndkernel function

  • powcnd Power cndkernel function

  • logcnd Log cndkernel function

  • caucnd Cauchy cndkernel function

  • chicnd Chi-Square cndkernel function

  • studcnd Generalized T-Student cndkernel function

The kernel parameter can also be set to a user defined function of class kernel by passing the function name as an argument.

qpar

the list of hyper-parameters (kernel parameters). This is a list which contains the parameters to be used with the kernel function. Valid parameters for existing kernels are :

  • sigma, q for the Radial Basis qkernel function "rbfbase" , the Laplacian qkernel function "laplbase" and the Cauchy qkernel function "caubase".

  • alpha, q for the Non Linear qkernel function "nonlbase".

  • c, q for the Rational Quadratic qkernel function "ratibase" , the Multiquadric qkernel function "multbase" and the Inverse Multiquadric qkernel function "invbase".

  • theta, q for the Wave qkernel function "wavbase".

  • d, q for the Power qkernel function "powbase" , the Log qkernel function "logbase" and the Generalized T-Student qkernel function "studbase".

  • alpha for the Non Linear cndkernel function "nonlcnd".

  • power, alpha, c for the Polynomial cndkernel function "polycnd".

  • gamma for the Radial Basis cndkernel function "rbfcnd" and the Laplacian cndkernel function "laplcnd" and the Cauchy cndkernel function "caucnd".

  • power, sigma for the ANOVA cndkernel function "anocnd".

  • c for the Rational Quadratic cndkernel function "raticnd" , the Multiquadric cndkernel function "multcnd" and the Inverse Multiquadric cndkernel function "invcnd".

  • theta for the Wave cndkernel function "wavcnd".

  • power for the Power cndkernel function "powcnd" , the Log cndkernel function "logcnd" and the Generalized T-Student cndkernel function "studcnd".

Hyper-parameters for user defined kernels can be passed through the qpar parameter as well.

eps

reachability distance, see Ester et al. (1996). (default:0.25)

MinPts

reachability minimum number of points, see Ester et al.(1996).(default : 5)

hybrid

whether the algothrim expects raw data but calculates partial distance matrices, can be TRUE or FALSE

seeds

can be TRUE or FALSE, FALSE to not include the isseed-vector in the dbscan-object.

showplot

whether to show the plot or not, can be TRUE or FALSE

na.action

a function to specify the action to be taken if NAs are found. The default action is na.omit, which leads to rejection of cases with missing values on any required variable. An alternative is na.fail, which causes an error if NA cases are found. (NOTE: If given, this argument must be named.)

countmode

NULL or vector of point numbers at which to report progress.

object

object of class dbscan.

data

matrix or data.frame.

newdata

matrix or data.frame with raw data to predict.

predict.max

max. batch size for predictions.

...

Further arguments transferred to plot methods.

Details

The data can be passed to the qkdbscan function in a matrix, in addition qkdbscan also supports input in the form of a kernel matrix of class qkernmatrix or class cndkernmatrix.

Value

predict(qkdbscan-method) gives out a vector of predicted clusters for the points in newdata.

qkdbscan gives out an S4 object which is a LIST with components

clust

integer vector coding cluster membership with noise observations (singletons) coded as 0

eps

parameter eps

MinPts

parameter MinPts

kcall

the function call

cndkernf

the kernel function used

xmatrix

the original data matrix

all the slots of the object can be accessed by accessor functions.

Note

The predict function can be used to embed new data on the new space.

Author(s)

Yusen Zhang
yusenzhang@126.com

References

Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu(1996).
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
Institute for Computer Science, University of Munich.
Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96)

See Also

qkernmatrix, cndkernmatrix

Examples

1
2
3
4
5
6
7
8
9
# a simple example using the iris
data(iris)
test <- sample(1:150,20)
x<- as.matrix(iris[-test,-5])
ds <- qkdbscan (x,kernel="laplbase",qpar=list(sigma=3.5,q=0.8),eps=0.15,
MinPts=5,hybrid = FALSE)
plot(ds,x)
emb <- predict(ds, x, as.matrix(iris[test,-5]))
points(iris[test,], col= as.integer(1+emb))

qkerntool documentation built on May 2, 2019, 6:11 a.m.