peakDistance2d: Peak distance in a 2D signature-PCA

Description Usage Arguments Value Examples

Description

Measures the Euclidean distance between the two highest peaks of the 2D density function. The density is estimated from the samples' distribution in the first two principal components of a PCA, using the genes of a given signature.

Usage

1
2
peakDistance2d(signature, data, threshold = 0.005, n = 200,
  magnitude = FALSE, scale = FALSE, filtered = FALSE)

Arguments

signature

character vector with the signature's gene identifiers

data

Gene expression matrix where rownames correspond to unique gene identifiers in signature fortmat and columns correspond to samples.

threshold

density cutoff. Density values lower than the threshold will not be considered peaks. Useful when outliers are present in the PCA.

n

Number of grid points in each direction. Can be scalar or a length-2 integer vector. See kde2d.

magnitude

When TRUE the score is multiplied by the cluster density. Default: FALSE.

scale

a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. See prcomp for more details. Default: FALSE.

filtered

logical value indicating if genes in the supplied signature list that are not present in the data have been filtered out. Default: FALSE.

Value

A vector of length 2. The first value corresponds to the score and the second to the number of genes in the signature that were found in the data.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
dummyData <- do.call(rbind, lapply(seq(0.1, 0.9, by = 0.1),
                     rnorm, n = 100, m = 6))

#add a row with bimodal gene expression
dummyData <- rbind(dummyData, c(rnorm(70, 6, 0.1), rnorm(30, 9, 0.1)))

rownames(dummyData) <- paste(rep("gene", nrow(dummyData)),
                             seq(1, nrow(dummyData)), sep = "")
rownames(dummyData)

dummySig <- c("gene1", "gene8", "gene9", "gene10", "gene20", "gene30")

peakDistance2d(dummySig, dummyData)

#values correspond to the peak distance, the number of genes from the
#signature found in the data and the total number of genes in the signature
#respectively

#removing the bimodal gene from the signature results to a lower score
peakDistance2d(dummySig[-4], dummyData)

sidiropoulos/PSigA documentation built on May 29, 2019, 9:58 p.m.