CLLPDestimate: Estimate CLL-PD in user-specified expression dataset

Description Usage Arguments Details Value

View source: R/CLLPDestimate.R

Description

CLLPDestimate returns the estimated CLL-PD value in the user-specified cohort.

Usage

1
2
3
4
5
6
7
CLLPDestimate(
  exprMatrix,
  identifier = "ensembl_gene_id",
  topVariant = NULL,
  normalize = TRUE,
  repeats = 20
)

Arguments

exprMatrix

A numeric matrix that contains the expression level of genes or probes for samples in a cohort. Rownames are gene identifiers and column names are sample identifiers.

identifier

A charachater variable that specifies the type of gene identifier. Currently only "ensembl_gene_id" and "gene_symbol" are allowed.

topVariant

If specified, it should be a numeric value indicating the number of most variant features (genes or probes) in the user-specificed data. The default value is NULL, which means all rows in the input matrix will be used.

normalize

A boolean value indicating wether the user-specified expression matrix should be centered by mean and scaled by standard deviation. The default value is TRUE.

repeats

A numeric variable specifying the number of repeats for cross-validation to select prediction model. The default value is 20.

Details

This function takes an gene expression dataset (RNAseq or microarray) of a external CLL cohort (user-specified), build a regularized linear model using the expression values of overlapped features in the built-in training cohort and use the selected model to esimate CLL-PD in the external cohort.

Value

A list containing three objects 1) estimated_CLLPD: A numeric vector of the estimated CLL-PD values in the user-specified cohort; 2) A dataframe of the features with non-zero coefficients and their coefficients in the selected model (model with highest R2 value); 3) A numeric vector of variance explained (R2) values for CLL-PD of the built-in cohort along the repeated cross-validation runs.


lujunyan1118/mofaCLL documentation built on Dec. 21, 2021, 12:42 p.m.