prVis: Polynomial-Based Manifold Exploration

Description Usage Arguments Details Value Examples

View source: R/prVis.R

Description

Polynomial-based alternative to t-SNE, UMAP etc.

Usage

1
2
3
4
5
prVis(xy, labels = FALSE, yColumn = ncol (xy), deg = 2, scale = FALSE,
nSubSam = 0, nIntervals = NULL, outliersRemoved = 0, pcaMethod = "prcomp",
,bigData = FALSE, saveOutputs="lastPrVisOut", cex = 0.5, alpha=0)
addRowNums(np, savedPrVisOut, specifyArea = FALSE)
colorCode(colName="",n=256,exps="", savedPrVisOut="lastPrVisOut", cex = 0.5)

Arguments

xy

Data frame with labels, if any, in the last column or can be specified in the yColumn.

labels

If TRUE, have class labels. The column specified by yColumn must be an R factor, unless nIntervals in non-NULL, in which case Y will be discretized to make labels.

yColumn

The column number of the labeled column, the default for yColumn is the last column of xy

deg

Degree of polynomial expansion.

scale

If TRUE, call scale on nonlabels data before generating polynomial terms.

nSubSam

Number of random rows of xy to sample; 0 means use the full dataset.

nIntervals

If Y column is continuous, transform it into a factor with n many levels

outliersRemoved

Specify how many outliers will be removed from the plot calculated using mahalanobis distance. Values between 0 and 1 will be interpreted as percentages (0.99 means remove the 99 percent of the data with the largest mahalanobis distances)

pcaMethod

A string that specifies the method of eigenvector computation, prcomp or RSpectra

bigData

a boolean that specifies whether dataframe should be processed using bigmemory package. data will be stored using as.big.matrix

saveOutputs

name of the file where prVis object will be saved. Use the empty string in order to not save results

cex

Controls the point size for plotting.

alpha

a number between 0 and 1 that that specifies the level of transparency for alpha blending. If alpha is specified then ggplot2 will be used to create the plot.

np

Number of points to add row numbers to in the plot. If no value is provided, rownumbers will be added to all datapoints in the selection

savedPrVisOut

the name of the file where a previous call to prVis was stored

area

A vector in the form of [x_start, x_finish, y_start, y_finish]. x_start, x_finish, y_start, and y_finish should all be between 0 and 1. These values correspond to percentages of the graph from left to right and bottom to top. [0,1,0,1] would specify the entirety of the graph. [0,0.5, 0.5,1] specifies the upper-left quadrant. x_start and y_start must be less than x_finish and y_finish respectivelly.

colName

The name of the column of continuous data that will be used for color coding

n

The number of shades used to color code the values of colName. n and exps should not both be specified at the same time.

exps

a vector of string expressions that will be used to create a factor column for coloring. If user specifies colName, they can not specify exps. Each expression corresponds to a group of the factor that will be created. The format for expression is listed below as a context free grammar.

<expression> ::= <subexpression> [(+|*) <subexpression>]
<subexpression> ::= columnName relationalOperator value

Note: * represents the logical and, + represents the logical or.

Details

A number of "nonlinear" analogs of Principle Components Analysis (PCA) have emerged, such as ICA, t-SNE, UMAP and so on. Intuitively, an approach based on polynomials may be effective too. Specifically, prVis first expands xy to polynomial terms, then applies PCA to the result.

Once a plot is displayed, addRowNums can be used to add row-number IDs of random points, to gain further insight into the data.

colorCode can be used to display color coding for user-specified expressions using the exps argument. colorCode can also be used to color based upon a column of continuous data by using the colName argument.

Value

If saveOutputs is set, a file is R list is created, with the components contained inside of a list called outputList. Two of the components, gpOut, the generated polynomial matrix, and prout, the return value from the call to prcomp will always be contained inside of outputList. Additional information may be included in outputList regarding the y column in colName, yCol, or yname.

Examples

1
2
3
4
5
6
7
8
9
data(peFactors)  # prgeng data, included in pkg
pe1 <- peFactors[,c(1,8,9)] 
z <- prVis(pe1,nSubSam=5000,labels=FALSE)
# get a bunch of streaks; why?
# call addRowNums() (not shown); discover that points on the same streak
# tend to have same combination of sex, education and occupation; moving
# along a streak mainly consists of variying age; call colorCode() (not
# shown) to explore
print('see data/SwissRoll for another example')

matloff/prVis documentation built on Dec. 3, 2019, 9:27 a.m.