Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/pickBestMarkers.R

Pick the best markers that distinguish between cells in and outside of a set of hyperspheres.

1 | ```
pickBestMarkers(x, chosen, downsample=10, p=0.05)
``` |

`x` |
A CyData object, constructed using |

`chosen` |
A vector specifying the rows of |

`downsample` |
A numeric scalar specifying the cell downsampling interval. |

`p` |
A numeric scalar defining the quantiles for gating. |

A putative subpopulation is defined by a user-supplied set of hyperspheres in `chosen`

.
Cells in `cellIntensities(x)`

are downsampled according to `downsample`

.
Then, this function identifies all cells in the downsampled set that were counted into any of the hyperspheres specified by `chosen`

at the tolerance `tol`

.
We recommend that `downsample`

also be set to the same value as that used in `countCells`

to construct `x`

.
(This ensures that the identified cells are consistent with those that were originally counted.
It also avoids situations where no cells are counted into hyperspheres for rare subpopulations, which prevents GLM fitting as the response will only have one level.)

Relevant markers are identified by fitting a binomial GLM with LASSO regression to the downsampled cells, using the `glmnet`

function.
The response is whether or not the cell was counted into the hyperspheres (and thus, the subpopulation).
The covariates are the marker intensities of each cell, used in a simple additive model with an intercept.
Upon fitting, the markers can be ranked from most to least important in terms of their ability to separate counted from uncounted cells.
This is done based on the LASSO iteration at which each marker's coefficient becomes non-zero - smaller values indicate more importance, while equal values indicate tied importance.
A panel of useful markers can subsequently be constructed by taking the top set from this ranking.

To evaluate the performance of each extra marker, we consider a progressive gating scheme.
For each marker, we define the gating boundaries as the interval between the `p`

and `1-p`

quantiles.
For a top set of markers, we calculate the number of cells from the subpopulation that fall inside the gating boundaries for each marker (i.e., true positives).
We repeat this for the number of cells not in the subpopulation (false positives).
This allows us to compute the recovery (i.e., sensitivity) of the gating scheme as the proportion of true positives out of the total number of cells in the subpopulation;
and the contamination (i.e., non-specificity), as the proportion of false positives out of the total number of gated cells.

A data frame is returned, where each row is a marker ordered in terms of decreasing importance.
The combined contamination and recovery proportions of the top `n`

markers are reported at row `n`

, along with the LASSO iteration to denote ties.
The lower and upper gating boundaries are also reported for each marker.

Aaron Lun

`countCells`

,
`prepareCellData`

,
`glmnet`

1 2 3 4 5 6 7 8 9 10 11 12 13 | ```
# Mocking up some data with two clear subpopulations.
nmarkers <- 10L
ex1 <- matrix(rgamma(nmarkers*1000, 2, 2), ncol=nmarkers, nrow=1000)
ex2 <- ex1; ex2[,1:4] <- ex2[,1:4] + 1
ex <- rbind(ex1, ex2)
colnames(ex) <- paste0("X", seq_len(nmarkers))
cd <- prepareCellData(list(A=ex))
cnt <- countCells(cd, filter=1L)
# Selecting all hyperspheres from one population.
second.pop <- cellInformation(cnt)$row > nrow(ex1)
selected <- second.pop[getCenterCell(cnt)]
pickBestMarkers(cnt, chosen=selected)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.