sim.dpp.modal.nystrom.kmeans: Subsample an observational dataset using the conditional DPP...

Description Usage Arguments Details Value References See Also Examples

View source: R/simdpp.r

Description

sim.dpp.modal.nystrom.kmeans() uses the kmeans-based Nystrom approximation of Zhang and Kwok (2010) to select n design sites from the observational dataset Xin using the DPP-based design emulator of Pratola et al. (2018). The design constructed assumes a Gaussian process regression model with stationary correlation function r(x,x^\prime), where the entries of R are formed by evaluating r(x,x^\prime) over a set of landmarks chosen by the kmeans algorithm, and the resulting eigenvectors are projected into the higher dimensional space using the Nystrom approximation. Additional options for the MiniBatchKmeans() algorithm from package ClusterR can be passed to alter the construction of the landmark set.

Usage

1
2
3
sim.dpp.modal.nystrom.kmeans(Xin,rho=rep(0.01,ncol(Xin)),
  n,m=max(ceiling(nrow(Xin)*0.1),n),method="KmeansNystrom",
  initializer="kmeans++",...)

Arguments

Xin

An n\times p dataset of observations from which we want to draw subsamples.

n

Size of the designed subsample to draw from Xall.

rho

The p\times 1 parameter vector for the Gaussian correlation model.

m

Number of landmark points to use in constructing the kmeans-based Nystrom approximation.

method

Type of approximation to use. Currently only supports “KmeansNystrom”.

initializer

Initialization to use in the Kmeans algorithm, default is “kmeans++”.

...

Additional options to pass to MiniBatchKmeans() for selecting the landmark points.

Details

For more details on the method, see Pratola et al. (2018). Detailed examples demonstrating the method are available at http://www.matthewpratola.com/software.

Value

A list containing a matrix which is the union of the observation matrix Xin and selected landmark sites, the indices into this matrix of the selected design sites as well as matrix of the design sites.

References

Pratola, Matthew T., Lin, C. Devon, and Craigmile, Peter. (2018) Optimal Design Emulators: A Point Process Approach. arXiv:1804.02089.

Zhang, Kai and Kwok, James T. (2010) Clustered Nystrom method for large scale manifold learning and dimension reduction. IEEE Transactions on Neural Networks, 21.10, 1576–1587. doi: 10.1109/TNN.2010.2064786

See Also

demu-package sim.dpp.modal sim.dpp.modal.nystrom

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
library(demu)

# Fake dataset in 5 dimensions
X=matrix(runif(500*5),ncol=5)
rho=rep(0.01,5)
n=50
samp=sim.dpp.modal.nystrom.kmeans(X,rho,n)
samp$design

# Could plot the result:
# pchvec=rep(1,nrow(samp$X))
# pchvec[samp$pts]=20
# cexvec=rep(0.1,nrow(samp$X))
# cexvec[samp$pts]=1
# colvec=rep("black",nrow(samp$X))
# colvec[samp$pts]="red"
# pairs(samp$X,pch=pchvec,cex=cexvec,col=colvec,xlim=c(0,1),ylim=c(0,1))

demu documentation built on Jan. 13, 2020, 5:06 p.m.