Pick the best markers that distinguish between cells in and outside of a set of hyperspheres.
pickBestMarkers(x, chosen, downsample=10, p=0.05)
A CyData object, constructed using
A vector specifying the rows of
A numeric scalar specifying the cell downsampling interval.
A numeric scalar defining the quantiles for gating.
A putative subpopulation is defined by a user-supplied set of hyperspheres in
cellIntensities(x) are downsampled according to
Then, this function identifies all cells in the downsampled set that were counted into any of the hyperspheres specified by
chosen at the tolerance
We recommend that
downsample also be set to the same value as that used in
countCells to construct
(This ensures that the identified cells are consistent with those that were originally counted.
It also avoids situations where no cells are counted into hyperspheres for rare subpopulations, which prevents GLM fitting as the response will only have one level.)
Relevant markers are identified by fitting a binomial GLM with LASSO regression to the downsampled cells, using the
The response is whether or not the cell was counted into the hyperspheres (and thus, the subpopulation).
The covariates are the marker intensities of each cell, used in a simple additive model with an intercept.
Upon fitting, the markers can be ranked from most to least important in terms of their ability to separate counted from uncounted cells.
This is done based on the LASSO iteration at which each marker's coefficient becomes non-zero - smaller values indicate more importance, while equal values indicate tied importance.
A panel of useful markers can subsequently be constructed by taking the top set from this ranking.
To evaluate the performance of each extra marker, we consider a progressive gating scheme.
For each marker, we define the gating boundaries as the interval between the
For a top set of markers, we calculate the number of cells from the subpopulation that fall inside the gating boundaries for each marker (i.e., true positives).
We repeat this for the number of cells not in the subpopulation (false positives).
This allows us to compute the recovery (i.e., sensitivity) of the gating scheme as the proportion of true positives out of the total number of cells in the subpopulation;
and the contamination (i.e., non-specificity), as the proportion of false positives out of the total number of gated cells.
A data frame is returned, where each row is a marker ordered in terms of decreasing importance.
The combined contamination and recovery proportions of the top
n markers are reported at row
n, along with the LASSO iteration to denote ties.
The lower and upper gating boundaries are also reported for each marker.
1 2 3 4 5 6 7 8 9 10 11 12 13
# Mocking up some data with two clear subpopulations. nmarkers <- 10L ex1 <- matrix(rgamma(nmarkers*1000, 2, 2), ncol=nmarkers, nrow=1000) ex2 <- ex1; ex2[,1:4] <- ex2[,1:4] + 1 ex <- rbind(ex1, ex2) colnames(ex) <- paste0("X", seq_len(nmarkers)) cd <- prepareCellData(list(A=ex)) cnt <- countCells(cd, filter=1L) # Selecting all hyperspheres from one population. second.pop <- cellInformation(cnt)$row > nrow(ex1) selected <- second.pop[getCenterCell(cnt)] pickBestMarkers(cnt, chosen=selected)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.