spectral_umap: Reduce dimensions of a matrix with pca and UMAP

Description Usage Arguments Value Examples

View source: R/spectral_umap.R

Description

spectral_umap() performs dimensionality reduction using the package monocle wrapper function around the python implementation of UMAP (Uniform Manifold Approximation and Projection). If no prcomp_object is supplied, principal component analysis will be performed using an irlba algorithm for sparse data.

Usage

1
2
3
4
5
spectral_umap(matrix, log_matrix = TRUE, prcomp_object = NULL,
  pca_version = "default", center = T, scale = T, dims = 1:10,
  umap_version = "default", python_home = system("which python", intern
  = TRUE), n_neighbors = 30L, metric = "correlation", min_dist = 0.1,
  spread = 1)

Arguments

matrix

a matrix of values to perform dimensionality reduction on; by default, rows are genes and columns are cells

log_matrix

if log10 transformation is to be performed on the matrix; defaults to TRUE

pca_version

PCA implementation to use. Possible values are "default" for sparse_pca() or "monocle" for the sparse_irlba_prcomp implemented in Monocle 3 alpha.

center

a logical value indicating whether the variables should be shifted to be zero centered. Alternately, if the "monocle" pca version is used, a centering vector of length equal the number of columns of x can be supplied.

scale

a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. If the "default" pca version is used and center = TRUE, scaling will be also default to TRUE. Alternatively, if the "monocle" pca version is used, a vector of length equal the number of columns of x can be supplied.

dims

dimensions from the prinicpal component analysis to use; defaults to 1:10 (i.e. 1st to 10th principal components)

umap_version

UMAP implementations to use; options are "default", "monocle" or "uwot". "monocle" only works if monocle 3 alpha and above is installed. The default option uses the UMAP function implemented in monocle 3 alpha, and works even without monocle 3 alpha installed.

python_home

The python home directory where umap is installed

n_neighbors

float (optional, default 15) The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100.

metric

string or function (optional, default 'correlation') The metric to use to compute distances in high dimensional space. If a string is passed it must match a valid predefined metric. If a general metric is required a function that takes two 1d arrays and returns a float can be provided. For performance purposes it is required that this be a numba jit'd function. Valid string metrics include: * euclidean * manhattan * chebyshev * minkowski * canberra * braycurtis * mahalanobis * wminkowski * seuclidean * cosine * correlation * haversine * hamming * jaccard * dice * russelrao * kulsinski * rogerstanimoto * sokalmichener * sokalsneath * yule Metrics that take arguments (such as minkowski, mahalanobis etc.) can have arguments passed via the metric_kwds dictionary. At this time care must be taken and dictionary elements must be ordered appropriately; this will hopefully be fixed in the future.

min_dist

float (optional, default 0.1) The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the “spread“ value, which determines the scale at which embedded points will be spread out.

spread

float (optional, default 1.0) The effective scale of embedded points. In combination with “min_dist“ this determines how clustered/clumped the embedded points are.

prcomp_obj

a principal component analysis object produced by the prcomp or prcomp_irlba functions; if no object is supplied, sparse_pca will be run on the matrix to return 50 dimensions; defaults to NULL; if a prcomp object is supplied, matrix is not required

Value

A matrix with two columns containing coordinates of each row for two dimensions respectively

Examples

1
spectral_umap(matrix, log_matrix=TRUE, prcomp_object=NULL, dims=1:10)

jacobheng/cellwrangler documentation built on Aug. 12, 2019, 6:49 a.m.