kNNclassify: kNNclassify

Description Usage Arguments Value Author(s) Examples

Description

The function classifies samples in an unsupervised fashion by:

  1. Running a principal component analysis.

  2. Uses Horn's technique to evaluate components to retain via paran.

  3. Finds k nearest neighbors in PCA space.

  4. Calculates the Euclidean distance between samples in PCA space.

  5. Constructs a weighted graph where each sample is connected to the k nearest neighbors with an edge weight = 1 - Euclidean distance.

  6. Uses the Louvain community detection algorithm to classify the samples.

Usage

1
kNNclassify(cpm, geneIdx, PCiter, k, pca = NULL, quietly = TRUE)

Arguments

cpm

matrix; Counts per million.

geneIdx

Integer; Indices of genes to include in PCA.

PCiter

Integer; Length 1 vector indicating the number of iterations to perform when determining the numer of retained principal components.

k

Integer; Length 1 vector indicating the number of nearest neighbors for each sample.

pca

Matrix; Optional pre-computed PCA. If NULL, PCA will be computed within the function.

quietly

Logical; indicates if function should be verbose.

Value

Returns a tibble with two columns; the first indicating the sample name and the second indicating the classification.

Author(s)

Jason T. Serviss

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#setup input data
s <- stringr::str_detect(colnames(testCounts), "^s")
e <- stringr::str_detect(rownames(testCounts), "^ERCC\\-[0-9]*$")
c <- testCounts[!e, s]
cpm <- t(t(c) / colSums(c) * 10^6)

#pre-run PCA
pca <- gmodels::fast.prcomp(t(cpm), scale. = TRUE)$x

#run KNN graph classification
kc <- kNNclassify(cpm, 1:nrow(c), 20, 15, pca = pca)

#plot
pData <- merge(kc, matrix_to_tibble(pca[, 1:2], "sample"))
plot(pData$PC1, pData$PC2, col = rainbow(4)[pData$louvain], pch = 16)

EngeLab/kNNclassification documentation built on May 5, 2019, 9:43 p.m.