knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache=TRUE,
  fig.height=6,
  fig.width=8
)

Download example data

Here, we use MNIST package developped by \@stillmatic as sample data.

You can install this package like the following:

devtools::install_github("stillmatic/MNIST")

Load data

Once you install stillmatic/MNIST, MNIST data is exported as MNIST::mnist_train.

Example the number 8

MNIST::show_digit(MNIST::mnist_train[770,])

Sampling

There are 60,000 records in the data, it is little bit too much data for usual SVD (for usual PC).

That's why we would like to do sampling here.

df <- MNIST::mnist_train[sample(seq_len(nrow(MNIST::mnist_train)), size=10^4), ]

Plot SVD

Plot the original data on the first and second singular vector plane.

# Last column is y column
x <- as.matrix(df[, -ncol(df)])/255
y <- df$y
frequentdirections::plot_svd(x, y)

Matrix Sketching

l = 8 case

eps <- 10^(-8)
# 10000 x 256 -> 8 * 256 matrix
b <- frequentdirections::sketching(x, 8, eps)
frequentdirections::plot_svd(x, y, b)

l = 32 case

# 10000 x 256 -> 32 * 256 matrix
b <- frequentdirections::sketching(x, 32, eps)
frequentdirections::plot_svd(x, y, b)

l = 128 case

# 10000 x 256 -> 128 * 256 matrix
b <- frequentdirections::sketching(x, 128, eps)
frequentdirections::plot_svd(x, y, b)

This result is almost the same with the original data SVD expression.

That's why we can think that the original data is expressed with only 128 rows.



shinichi-takayanagi/frequentdirections documentation built on May 12, 2019, 12:28 a.m.