pacmap: pacmap

View source: R/pacmap.R

pacmapR Documentation

pacmap

Description

An R wrapper for the PaCMAP Python module found at https://github.com/YingfanWang/PaCMAP

PaCMAP (Pairwise Controlled Manifold Approximation) is a dimensionality reduction method that can be used for visualization, preserving both local and global structure of the data in original space. PaCMAP optimizes the low dimensional embedding using three kinds of pairs of points: neighbor pairs (pair_neighbors), mid-near pair (pair_MN), and further pairs (pair_FP).

Usage

pacmap(
  rdf,
  n_dims = 2,
  n_neighbors = NULL,
  MN_ratio = 0.5,
  FP_ratio = 2,
  pair_neighbors = NULL,
  pair_MN = NULL,
  pair_FP = NULL,
  distance = "euclidean",
  lr = 1,
  num_iters = 450,
  verbose = FALSE,
  apply_pca = TRUE,
  intermediate = FALSE
)

Arguments

rdf

A variable by observation data frame

verbose

integer Verbosity level. Default: 0

n_components

integer Dimensions of the embedded space. Default: 3

perplexity

numeric The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. The choice is not extremely critical since t-SNE is quite insensitive to this parameter. Default: 30

early_exaggeration

numeric Controls how tight natural clusters in the original space are in the embedded space and how much space will be between them. For larger values, the space between natural clusters will be larger in the embedded space. Again, the choice of this parameter is not very critical. If the cost function increases during initial optimization, the early exaggeration factor or the learning rate might be too high. Default: 12.0

learning_rate

numeric The learning rate for t-SNE is usually in the range [10.0, 1000.0]. If the learning rate is too high, the data may look like a ‘ball’ with any point approximately equidistant from its nearest neighbours. If the learning rate is too low, most points may look compressed in a dense cloud with few outliers. If the cost function gets stuck in a bad local minimum increasing the learning rate may help. Default: 200.0

n_iter

integer Maximum number of iterations for the optimization. Should be at least 250. Default: 1000

n_iter_without_progress

integer Maximum number of iterations without progress before we abort the optimization, used after 250 initial iterations with early exaggeration. Note that progress is only checked every 50 iterations so this value is rounded to the next multiple of 50. Default: 300

min_grad_norm

numeric If the gradient norm is below this threshold, the optimization will be stopped. Default: 1e-7

metric

character or callable The metric to use when calculating distance between instances in a feature array. If metric is a character, it must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter, or a metric listed in pairwise.PAIRWISE.DISTANCE.FUNCTIONS. If metric is “precomputed”, X is assumed to be a distance matrix. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays from X as input and return a value indicating the distance between them. The default is “euclidean” which is interpreted as squared euclidean distance.

init

character or numpy array Initialization of embedding. Possible options are ‘random’, ‘pca’, and a numpy array of shape (n.samples, n.components). PCA initialization cannot be used with precomputed distances and is usually more globally stable than random initialization. Default: “random”

random_state

int, RandomState instance or NULL If int, random.state is the seed used by the random number generator; If RandomState instance, random.state is the random number generator; If NULL, the random number generator is the RandomState instance used by np.random. Note that different initializations might result in different local minima of the cost function. Default: NULL

method

character By default the gradient calculation algorithm uses Barnes-Hut approximation running in O(N log N) time. method=’exact’ will run on the slower, but exact, algorithm in O(N^2) time. The exact algorithm should be used when nearest-neighbor errors need to be better than 3 examples. Default: ‘barnes.hut’

angle

numeric Only used if method=’barnes.hut’ This is the trade-off between speed and accuracy for Barnes-Hut T-SNE. ‘angle’ is the angular size (also referred to as theta) of a distant node as measured from a point. If this size is below ‘angle’ then it is used as a summary node of all points contained within it. This method is not very sensitive to changes in this parameter in the range of 0.2 - 0.8. Angle less than 0.2 has quickly increasing computation time and angle greater 0.8 has quickly increasing error.#' Default: 0.5

auto_iter

boolean Should optimal parameters be determined? If false, behaves like stock MulticoreTSNE Default: TRUE

auto_iter_end

intNumber of iterations for parameter optimization. Default: 5000

n_jobs

Number of processors to use. Default: all.

Value

data.frame with tSNE coordinates


milescsmith/ReductionWrappers documentation built on March 25, 2023, 11:58 a.m.