CAM3Run: Convex Analysis of Mixtures Version 3

View source: R/CAM3Run.R

CAM3RunR Documentation

Convex Analysis of Mixtures Version 3

Description

This function performs a fully unsupervised computational deconvolution to identify marker genes that define each of the multiple subpopulations, and estimate the proportions of these subpopulations in the mixture tissues as well as their respective expression profiles.

Usage

CAM3Run(
  data,
  K = NULL,
  dim.rdc = 10,
  thres.low = 0.05,
  thres.high = 1,
  cluster.method = c("Fixed-Radius", "K-Means"),
  radius.thres = 0.95,
  sim.thres = 0.95,
  cluster.num = 50,
  MG.num.thres = 20,
  sample.weight = NULL,
  fast.mode = TRUE,
  generalNMF = FALSE
)

Arguments

data

Matrix of mixture expression profiles. Data frame, SummarizedExperiment or ExpressionSet object will be internally coerced into a matrix. Each row is a gene and each column is a sample. Data should be in non-log linear space with non-negative numerical values (i.e. >= 0). Missing values are not supported. All-zero rows will be removed internally.

K

The candidate subpopulation number(s), e.g. K = 2:8.

dim.rdc

Reduced data dimension; should be not less than maximum candidate K.

thres.low

The lower bound of percentage of genes to keep for CAM with ranked norm. The value should be between 0 and 1. The default is 0.05.

thres.high

The higher bound of percentage of genes to keep for CAM with ranked norm. The value should be between 0 and 1. The default is 1.

cluster.method

The method to do clustering. The default "Fixed-Radius" will make all the clusters with the same size. The alternative "K-Means" will use kmeans.

radius.thres

The "cosine" radius of "Fixed-Radius" clustering. The default is 0.95

sim.thres

The cosine similarity threshold of cluster centers. For clusters with cosine similarity higher than the threshold, they would be merged until the number of clusters equals to cluster.num. This parameter could control the upper bound of similarity amoung sources. The default is 0.95.

cluster.num

The lower bound of cluster number, which should be much larger than K. The default is 50.

MG.num.thres

The clusters with the gene number smaller than MG.num.thres will be treated as outliers. The default is 20.

sample.weight

Vector of sample weights. If NULL, all samples have the same weights. The length should be the same as sample numbers. All values should be positive.

fast.mode

Use fast mode of greedy search or not. The normal mode may give more accurate results, but computation time is much longer. The default is TRUE.

generalNMF

If TRUE, the decomposed proportion matrix has no sum-to-one constraint for each row. The default is FALSE. TRUE value brings two changes: (1) Without assuming samples are normalized, the first principal component will not forced to be along c(1,1,..,1) but a standard PCA will be applied during preprocessing. (2) Without sum-to-one constraint for each row, the scale ambiguity of each column vector in proportion matrix will not be removed.

cores

The number of system cores for parallel computing. If not provided, one core for each element in K will be invoked. Zero value will disable parallel computing.

Details

This function includes three necessary steps to decompose a matrix of mixture expression profiles: data preprocessing, marker gene cluster search, and matrix decomposition. They are implemented in CAM3Prep, CAM3MGCluster and CAM3ASest, separately. More details can be found in the help document of each function.

For this function, you needs to specify the range of possible subpopulation numbers and the percentage of low/high-expressed genes to be removed. Typically, 30\ gene expression data. The removal of high-expressed genes has much less impact on results, and usually set to be 0\

This function can also analyze other molecular expression data, such as proteomics data. Much less low-expressed proteins need to be removed, e.g. 0\

Value

An object of class "CAMObj" containing the following components:

PrepResult

An object of class "CAMPrepObj" containing data preprocessing results from CAMPrep function.

MGResult

A list of "CAMMGObj" objects containing marker gene detection results from CAMMGCluster function for each K value.

ASestResult

A list of "CAMASObj" objects containing estimated proportions, subpopulation-specific expressions and mdl values from CAMASest function for each K value.

Examples

#obtain data
data(ratMix3)
data <- ratMix3$X

#CAM3 with known subpopulation number
rCAM3 <- CAM3Run(data, K = 3, dim.rdc = 3, thres.low = 0.30, thres.high = 0.95)
#Larger dim.rdc can improve performance but increase time complexity

## Not run: 
#CAM with a range of subpopulation number
rCAM3 <- CAM3Run(data, K = 2:5, dim.rdc = 10, thres.low = 0.30, 
thres.high = 0.95)

## End(Not run)

ChiungTingWu/CAM3 documentation built on Feb. 14, 2024, 9:22 a.m.