neighborDistances: Compute distances to neighbors
In cydar: Using Mass Cytometry for Differential Abundance Analyses

Description Usage Arguments Details Value Author(s) See Also Examples

Calculate the distances in high-dimensional space to the neighboring cells.

1	neighborDistances(prepared, neighbors = 50, downsample = 50, as.tol = TRUE)

`prepared`	A List object containing a BiocNeighborIndex object, typically produced by `prepareCellData`.
`neighbors`	An integer scalar specifying the number of neighbours.
`downsample`	An integer scalar specifying the frequency with which cells are examined.
`as.tol`	A logical scalar specifying if the distances should be reported as tolerance values.

This function examines each cell at the specified downsampling frequency, and computes the Euclidean distances to its nearest neighbors. If as.tol=TRUE, these distances are reported on the same scale as tol in countCells. This allows users to choose a value for tol based on the output of this function. Otherwise, the distances are reported without modification.

To visualize the distances/tolerances, one option is to use boxplots, as shown below. Each boxplot represents the distribution of tolerances required for hyperspheres to contain a certain number of cells. For example, assume that at least 20 cells in each hypersphere are needed to have sufficient power for hypothesis testing. Now, consider all hyperspheres that are large enough to include the 19th nearest neighbour. The average distance required to do so would be the median of the boxplot generated from the 19th column of the output.

Another option is to examine the distribution of counts at a given tolerance/distance. This is done by counting the number of hyperspheres with a particular number of nearest neighbors closer than the specified tolerance. In this manner, the expected count distribution from setting a particular tolerance can be determined. Note that the histogram is capped at neighbors to save time.

Note that, for each examined cell, its neighbors are identified from the full set of cells. Downsampling only changes the rate at which cells are examined, for the sake of computational efficiency. Neighbors are not identified from the downsampled set as this will inflate the reported distances.

A numeric matrix of distances where each row corresponds to an examined cell and each column i corresponds to the ith closest neighbor.

Aaron Lun

prepareCellData, to generate the prepared object.

countCells, where the choice of tol can be guided by the distance distributions.

example(prepareCellData, echo=FALSE)

distances <- neighborDistances(cd, as.tol=FALSE)
boxplot(distances, xlab="Neighbor", ylab="Distance")

# Making a plot to choose 'tol' in countCells().
distances <- neighborDistances(cd, as.tol=TRUE)
boxplot(distances, xlab="Neighbor", ylab="Tolerance")

required.count <- 20 # 20 cells per hypersphere 
med <- median(distances[,required.count-1]) 
segments(-10, med, required.count-1, col="dodgerblue")
segments(required.count-1, med, y1=0, col="dodgerblue")

# Examining the distribution of counts at a given 'tol' of 0.7.
# (Adding 1 to account for the cell at the centre of the hypersphere.)
counts <- rowSums(distances <= 0.7) + 1
hist(counts, xlab="Count per hypersphere")