phate: Run PHATE on an input data matrix

Description Usage Arguments Value Examples

View source: R/phate.R

Description

PHATE is a data reduction method specifically designed for visualizing high dimensional data in low dimensional spaces.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
phate(
  data,
  ndim = 2,
  knn = 5,
  decay = 40,
  n.landmark = 2000,
  gamma = 1,
  t = "auto",
  mds.solver = "sgd",
  knn.dist.method = "euclidean",
  knn.max = NULL,
  init = NULL,
  mds.method = "metric",
  mds.dist.method = "euclidean",
  t.max = 100,
  npca = 100,
  plot.optimal.t = FALSE,
  verbose = 1,
  n.jobs = 1,
  seed = NULL,
  potential.method = NULL,
  k = NULL,
  alpha = NULL,
  use.alpha = NULL,
  ...
)

Arguments

data

matrix (n_samples, n_dimensions) 2 dimensional input data array with n_samples samples and n_dimensions dimensions. If knn.dist.method is 'precomputed', data is treated as a (n_samples, n_samples) distance or affinity matrix

ndim

int, optional, default: 2 number of dimensions in which the data will be embedded

knn

int, optional, default: 5 number of nearest neighbors on which to build kernel

decay

int, optional, default: 40 sets decay rate of kernel tails. If NULL, alpha decaying kernel is not used

n.landmark

int, optional, default: 2000 number of landmarks to use in fast PHATE

gamma

float, optional, default: 1 Informational distance constant between -1 and 1. gamma=1 gives the PHATE log potential, gamma=0 gives a square root potential.

t

int, optional, default: 'auto' power to which the diffusion operator is powered sets the level of diffusion

mds.solver

'sgd', 'smacof', optional, default: 'sgd' which solver to use for metric MDS. SGD is substantially faster, but produces slightly less optimal results. Note that SMACOF was used for all figures in the PHATE paper.

knn.dist.method

string, optional, default: 'euclidean'. recommended values: 'euclidean', 'cosine', 'precomputed' Any metric from scipy.spatial.distance can be used distance metric for building kNN graph. If 'precomputed', data should be an n_samples x n_samples distance or affinity matrix. Distance matrices are assumed to have zeros down the diagonal, while affinity matrices are assumed to have non-zero values down the diagonal. This is detected automatically using data[0,0]. You can override this detection with knn.dist.method='precomputed_distance' or knn.dist.method='precomputed_affinity'.

knn.max

int, optional, default: NULL Maximum number of neighbors for which alpha decaying kernel is computed for each point. For very large datasets, setting knn.max to a small multiple of knn can speed up computation significantly.

init

phate object, optional object to use for initialization. Avoids recomputing intermediate steps if parameters are the same.

mds.method

string, optional, default: 'metric' choose from 'classic', 'metric', and 'nonmetric' which MDS algorithm is used for dimensionality reduction

mds.dist.method

string, optional, default: 'euclidean' recommended values: 'euclidean' and 'cosine'

t.max

int, optional, default: 100. Maximum value of t to test for automatic t selection.

npca

int, optional, default: 100 Number of principal components to use for calculating neighborhoods. For extremely large datasets, using n_pca < 20 allows neighborhoods to be calculated in log(n_samples) time.

plot.optimal.t

boolean, optional, default: FALSE If TRUE, produce a plot showing the Von Neumann Entropy curve for automatic t selection.

verbose

int or boolean, optional (default : 1) If TRUE or > 0, print verbose updates.

n.jobs

int, optional (default: 1) The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n.cpus + 1 + n.jobs) are used. Thus for n_jobs = -2, all CPUs but one are used

seed

int or NULL, random state (default: NULL)

potential.method

Deprecated. For log potential, use gamma=1. For sqrt potential, use gamma=0.

k

Deprecated. Use knn.

alpha

Deprecated. Use decay.

use.alpha

Deprecated To disable alpha decay, use alpha=NULL

...

Additional arguments for graphtools.Graph.

Value

"phate" object containing:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
if (reticulate::py_module_available("phate")) {

# Load data
# data(tree.data)
# We use a smaller tree to make examples run faster
data(tree.data.small)

# Run PHATE
phate.tree <- phate(tree.data.small$data)
summary(phate.tree)
## PHATE embedding
## knn = 5, decay = 40, t = 58
## Data: (3000, 100)
## Embedding: (3000, 2)

library(graphics)
# Plot the result with base graphics
plot(phate.tree, col=tree.data.small$branches)
# Plot the result with ggplot2
if (require(ggplot2)) {
  ggplot(phate.tree) +
    geom_point(aes(x=PHATE1, y=PHATE2, color=tree.data.small$branches))
}

# Run PHATE again with different parameters
# We use the last run as initialization
phate.tree2 <- phate(tree.data.small$data, t=150, init=phate.tree)
# Extract the embedding matrix to use in downstream analysis
embedding <- as.matrix(phate.tree2)

}

KrishnaswamyLab/phateR documentation built on Feb. 15, 2021, 4:22 a.m.