tdaSweep: TDAsweep for Dimension Reduction in Image Classification

Description Usage Arguments Details Value Author(s) Examples

Description

Functions implementing the TDAsweep method for dimension reduction of image data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
tdaFit(
  images,
  labels,
  nr,
  nc,
  rgb = TRUE,
  thresholds = 0, 
  intervalWidth = 1,
  cls = NULL,
  rcOnly = FALSE,
  qeFtn,
  mlFtnArgs = NULL
) 
  
predict.tdaFit(
  tdaFitObject,
  newImages
)


TDAsweep(
  images,
  labels,
  nr,
  nc,
  rgb=TRUE, 
  thresholds = 0,
  intervalWidth=1,
  cls=NULL,
  prep=FALSE,
  rcOnly=FALSE
) 

TDAsweepOneImg(
  i2D,
  nr,
  nc,
  valType='raw',
  intervalWidth=1,
  rcOnly=FALSE
) 
   
   
TDAsweepImgSet(
  imgsPrepped,
  nr,
  nc,
  valType='1Dcells',
  intervalWidth=1,
  rcOnly=FALSE
) 

prepOneImage(
  img2D,
  thresh
) 

prepImgSet(
  imgs,
  nr,
  labels,
  thresh
) 

  

Arguments

images

Matrix or data frame of image dataset, one image per row.

labels

Vector or R factor, one element per row of images.

nr

Number of rows per image.

nc

Number of columns per image. Must have nr * nc = ncol(images).

rgb

TRUE indicates color images.

thresholds

Vector of TDAsweep thresholds.

intervalWidth

Number of rows etc. in a TDAsweep group.

cls

Number of Clusters for parallel computation.

prep

If the images are already in proper format, produced by prepImgSet function.

rconly

Perform row and column sweeps only, no diagonals.

qeFtn

Quoted name of desired qe*-series function.

mlFtnArgs

R list of optional arguments for the qe*-series function.

tdaFitObject

An object returned by tdaFit.

newImages

Matrix or data frame of new images to be predicted, in the same form that had been input to tdaFit.

i2D

Output of regtools::imgTo2D(), with row number, column number, intensity for each one of a filtered set of pixels

valType

Type of return value, currently 'raw' or '1Dcells'; the former means the raw counts, not grouped into intervals, while the latter means grouped

img2D

Output of regtools::imgTo2D() for a single image; each row is of form (row number,column number,intensity), storing information for a given pixel

imgsPrepped

Output from prepImgSet()

Details

The function tdaFit is offered for convenience, a "turnkey" tool. It performs both the tdaSweep and model-fitting steps. The paired prediction function, predict.tdaFit, is similarly integrated. Model-fitting is done via the qe*-series ("quick and easy") from regtools, offering logistic, multi-outcome linear, random forests, gradient boosting, SVM and neural networks. This wrapper thus enables the user to focus better on choosing hyperparameters and so on.

The function TDAsweep is the wrapper function to perform TDAsweep. The function wraps up functions TDAsweepOneImg(), TDAsweepImgSet(), prepOneImage(), and prepImgSet() to create a complete pipeline for TDAsweep. Specifically, the function formats the input image dataset to an appropriate format for TDAsweep and sweeps in the input image dataset in four directions (column, row, NW to SE, and NE to SW).

As mentioned above, functions TDAsweepOneImg(), TDAsweepImgSet(), prepOneImage(), and prepImgSet() are the building block functions of TDAsweep, which are integrated in the wrapper function TDAsweep(). These could be called for experimental or debugging purposes.

Value

The function tdaFit returns an object of type tdaFit, suitable for input to predict.tdaFit, called as predict. One component of the object, testAcc, shows the overall probability of correct classification on a holdout set.

The function TDAsweep returns a S3 class object called sweepOut, which contains the reduced dataset, number of samples, number of features, thresholds specified, and the intervalWidth specified. Specifically, the user can use the reduced dataset for input as a train set to a machine learning classification model of choice.

Author(s)

Norm Matloff

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
## Not run: 
# this example shows the use of tdaFit and predict.tdaFit()
# need to first get the MNIST data, in form required for 'images'
# arguments; one way is 
mnist <- getMNIST()
idxs <- sample(1:nrow(mnist),10000)  # keep the scale small in this example
x <- mnist[idxs,-785]
y <- mnist[idxs,785]
# fit, and predict first few
tfout <- tdaFit(x,y,28,28,FALSE,c(100,175),qeFtn='qeRF') 
predict(tfout,x[1:3,]) 
# performance on holdout set (within training set)
tfout$testAcc

# fit a gradient boosting model, with optional parameters
tfout <- tdaFit(x,y,28,28,FALSE,c(100,175),qeFtn='qeGBoost',
   mlFtnArgs=list(nTree=500,minNodeSize = 20))


## End(Not run)

## Not run: 
# This example shows the use of TDAsweep(), along with e1071 SVM as the classification model on the famous MNIST dataset.
library(tdaImage)  
library(e1071)

#---- data preparation ----#
# will need to first prepare the MNIST dataset. One way to get it: https://www.kaggle.com/c/digit-recognizer
mnist <- read.csv("PATH TO MNIST.CSV")
mnist$y <- as.factor(mnist$y)
set.seed(1)
train_idx <- sample(seq_len(nrow(mnist)), 0.8*nrow(mnist))  # simple sampling
train_set <- mnist[train_idx, -785]  # exclude label if doing tda
train_y_true <- mnist[train_idx, 785]
test_set <- mnist[-train_idx, -785]
test_y_true <- mnist[-train_idx, 785]

#---- parameters for performing TDAsweep ----#
nr = 28  # mnist is 28x28
nc = 28
rgb = FALSE  # mnist is grey scaled
thresholds = c(50)  # set one threshold, 50
intervalWidth = 1  # set intervalWidth to 1

#---- performing tda on train set ----#
tda_train_set <- tda_wrapper_func(image=train_set, labels=train_y_true, 
                                        nr=nr, nc=nc, rgb=rgb, thresh=thresholds,
                                        intervalWidth=intervalWidth)
dim(tda_train_set)  # 784 -> 166 features after TDAsweep
tda_train_set <- as.data.frame(tda_train_set)
tda_train_set$labels <- as.factor(tda_train_set$labels)

#---- performing tda on test set ----#
tda_test_set <- tda_wrapper_func(image=test_set, labels=test_y_true,
                                        nr=nr, nc=nc, rgb=rgb, thresh=thresholds,
                                        intervalWidth=intervalWidth)
tda_test_set <- as.data.frame(tda_test_set)
tda_test_label <- tda_test_set$labels
tda_test <- tda_test_set[, -167]  # take out labels for testing the svm model later

#---- training and predicting using e1071 svm model ----#
system.time(svm_model <- svm(labels ~., data=tda_train_set))
predict <- predict(svm_model, newdata=tda_test)

#---- Evaluation ----#
mean(predict == tda_test_label) # accuracy on test set


## End(Not run)

matloff/tdaImage documentation built on Aug. 14, 2021, 7:46 a.m.