DensityScatter.DDCAL: Scatter density plot [Brinkmann et al., 2023]

DensityScatter.DDCALR Documentation

Scatter density plot [Brinkmann et al., 2023]

Description

Density estimation (PDE) [Ultsch, 2005] or "SDH" [Eilers/Goeman, 2004] used for a scatter density plot, with clustering of densities with DDCAL [Lux/Rinderle-Ma, 2023] proposed by [Brinkmann et al., 2023].

Usage

DensityScatter.DDCAL(X, Y, nClusters = 12, Plotter = "native", 
SDHorPDE = TRUE, LimitShownPoints = FALSE,
Marginals = FALSE, na.rm=TRUE, pch, Size, 
xlab="x", ylab="y", main = "",lwd = 2,
xlim=NULL,ylim=NULL,Polygon,BW = TRUE,Silent = FALSE, ...)

Arguments

X

Numeric vector [1:n], first feature (for x axis values)

Y

Numeric vector [1:n], second feature (for y axis values)

nClusters

(Optional) Integer defining the number of clusters (colors) used for finding a hard color transition, default is 12.

Plotter

(Optional) String, name of the plotting backend to use. Possible values are: "native", "plotly", or "ggplot2"

SDHorPDE

(Optional) Boolean, if TRUE SDH is used to calculate density, if FALSE PDE is used

LimitShownPoints

(Optional) FALSE: does nothing, TRUE: samples the number of optimal points for visualization using SampleScatter

Marginals

(Optional) Boolean, if TRUE the marginal distributions of X and Y will be plotted together with the 2D density of X and Y. Default is FALSE

na.rm

(Optional) Boolean, if TRUE non finite values will be removed

pch

(Optional) Scalar or character. Indicates the shape of data points, see plot function, symbol argument in plotly package, or the shape argument in ggplot2 package, default is 20 for native and for ggplot2, and 0 for plotly

Size

(Optional) Scalar, size of data points in plot, default is 1 for native, 6 for plotly, and 3 for ggplot2

xlab

(Optional) String, title of the x axis. Default: "X", see plot() function, or similar functonality in plotly or ggplot2

ylab

(Optional) String, title of the y axis. Default: "Y", see plot() function, or similar functonality in plotly or ggplot2

main

(Optional) Character, title of the plot.

lwd

(Optional) Scalar, thickness of the lines used for the marginal distributions (only needed if Marginals=TRUE), see plot(). Default = 2

xlim

(Optional) numerical vector, min and max of x values to be plottet

ylim

(Optional) numerical vector, min and max of y values to be plottet

Polygon

(Optional) [1:p,1:2] numeric matrix that defines for x and y coordinates a polygon in magenta

BW

(Optional) Boolean, if TRUE and Plotter="ggplot2" will use a white background, if FALSE and Plotter="ggplot2", the typical ggplot2 background is used. Not needed if "Plotter="native". Default is TRUE

Silent

(Optional) Boolean, if TRUE no messages will be printed, default is FALSE

...

Further plot arguments

Details

The DensityScatter.DDCAL function generates the density of the xy data as a z coordinate. Afterwards xyz will be plotted as a contour plot. It assumens that the cases of x and y are mapped to each other meaning that a cbind(x,y) operation is allowed. The colors for the densities in the contour plot are calculated with DDCAL, which produces clusters to evenly distribute the densities in low variance clusters.

In the case of "native" as Plotter, the handle returns NULL because the basic R functon plot() is used.

For the returned density values see SmoothedDensitiesXY or PDEscatter depending on input parameter SDHorPDE for details.

Value

returns a invisible list with

DF

[1:m,1:5] of Density values, x values, y values, colors, and classification vector Cls. m=n if LimitShownPoints=FALSE, otherwise LimitShownPoints=TRUE m<n meaning that subsample is taken

PlotHandle

the plotting handle, either an object of plotly, ggplot2 or NULL depending on input parameter Plotter

Author(s)

Luca Brinkmann, Michael Thrun

References

[Ultsch, 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.

[Eilers/Goeman, 2004] Eilers, P. H., & Goeman, J. J.: Enhancing scatterplots with smoothed densities, Bioinformatics, Vol. 20(5), pp. 623-628. 2004.

[Lux/Rinderle-Ma, 2023] Lux, M. & Rinderle-Ma, S.: DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling, Journal of Classification vol. 40, pp. 106-144, 2023.

[Brinkmann et al., 2023] Brinkmann, L., Stier, Q., & Thrun, M. C.: Computing Sensitive Color Transitions for the Identification of Two-Dimensional Structures, Proc. Data Science, Statistics & Visualisation (DSSV) and the European Conference on Data Analysis (ECDA), p.109, Antwerp, Belgium, July 5-7, 2023.

Examples




# Create two bimodial distributions
x1=rnorm(n = 7500,mean = 0,sd = 1)
y1=rnorm(n = 7500,mean = 0,sd = 1)
x2=rnorm(n = 7500,mean = 2.5,sd = 1)
y2=rnorm(n = 7500,mean = 2.5,sd = 1)
x=c(x1,x2)
y=c(y1,y2)

DensityScatter.DDCAL(x, y, Marginals = TRUE)


ScatterDensity documentation built on April 15, 2025, 5:09 p.m.