bid: Calculate Differential Expression (DE) or Differential...

View source: R/pipeline_functions.R

bidR Documentation

Calculate Differential Expression (DE) or Differential Activity (DA) by Using Bayesian Inference

Description

bid calculates the differential expression (DE) / differential activity (DA) by using Bayesian Inference method. Users can choose different regression models and pooling strategies.

Usage

bid(
  mat = NULL,
  use_obs_class = NULL,
  class_order = NULL,
  class_ordered = TRUE,
  method = "Bayesian",
  family = gaussian,
  pooling = "full",
  prior.V.scale = 0.02,
  prior.R.nu = 1,
  prior.G.nu = 2,
  nitt = 13000,
  burnin = 3000,
  thin = 10,
  std = TRUE,
  logTransformed = TRUE,
  log.base = 2,
  average.method = "geometric",
  pseudoCount = 0,
  return_model = FALSE,
  use_seed = 999,
  verbose = FALSE
)

Arguments

mat

matrix, the expression/activity matrix of IDs (gene/transcript/probe) from one gene. Rows are IDs, columns are samples. It is strongly suggested to contain rownames of IDs and column names of samples. Example, geneA has two probes A1 and A2 across all 6 samples (Case-rep1, Case-rep2, Case-rep3, Control-rep1, Control-rep2 and Control-rep3). The mat of geneA is a 2*6 numeric matrix. Likewise, if geneA has only one probe, the mat is a one-row matrix.

use_obs_class

a vector of characters, the category of sample. If the vector names are not available, the order of samples in use_obs_class must be the same as in mat. Users can call get_obs_label to create this vector.

class_order

a vector of characters, the order of the sample's category. The first class in this vector will be considered as the control group by default. If NULL, the order will be assigned using alphabetical order. Default is NULL.

class_ordered

logical, if TRUE, the class_order will be ordered. And the order must be consistent with the phenotypic trend, such as "low", "medium", "high". Default is TRUE.

method

character, users can choose between "MLE" and "Bayesian". "MLE", the maximum likelihood estimation, will call generalized linear model(glm/glmer) to perform data regression. "Bayesian", will call Bayesian generalized linear model (bayesglm) or multivariate generalized linear mixed model (MCMCglmm) to perform data regression. Default is "Bayesian".

family

character or family function or the result of a call to a family function. This parameter is used to define the model's error distribution. See ?family for details. Currently, options are gaussian, poisson, binomial(for two-group sample classes)/category(for multi-group sample classes)/ordinal(for multi-group sample classes with class_ordered=TRUE). If set with gaussian or poission, the response variable in the regression model will be the expression level, and the independent variable will be the sample's phenotype. If set with binomial, the response variable in the regression model will be the sample phenotype, and the independent variable will be the expression level. For binomial, category and ordinal input, the family will be automatically reset, based on the sample's class level and the setting of class_ordered. Default is gaussian.

pooling

character, users can choose from "full","no" and "partial". "full", use probes as independent observations. "no", use probes as independent variables in the regression model. "partial", use probes as random effect in the regression model. Default is "full".

prior.V.scale

numeric, the V in the parameter "prior" used in MCMCglmm. It is meaningful to set when one choose "Bayesian" as method and "partial" as pooling. Default is 0.02.

prior.R.nu

numeric, the R-structure in the parameter "prior" used in MCMCglmm. It is meaningful to set when one choose "Bayesian" as method and "partial" as pooling. Default is 1.

prior.G.nu

numeric, the G-structure in the parameter "prior" used in MCMCglmm. It is meaningful to set when one choose "Bayesian" as method and "partial" as pooling. Default is 2.

nitt

numeric, the parameter "nitt" used in MCMCglmm. It is meaningful to set when one choose "Bayesian" as method and "partial" as pooling. Default is 13000.

burnin

numeric, the parameter "burnin" used in MCMCglmm. It is meaningful to set when one choose "Bayesian" as method and "partial" as pooling. Default is 3000.

thin

numeric, the parameter "thin" used in MCMCglmm. It is meaningful to set when one choose "Bayesian" as method and "partial" as pooling. Default is 10.

std

logical, if TRUE, the expression matrix will be normalized by column. Default is TRUE.

logTransformed

logical, if TRUE, log transformation has been performed. Default is TRUE.

log.base

numeric, the base of log transformation when do.logtransform is set to TRUE. Default is 2.

average.method

character, the method applied to calculate FC (fold change). Users can choose between "geometric" and "arithmetic". Default is "geometric".

pseudoCount

integer, the integer added to avoid "-Inf" showing up during log transformation in the FC (fold change) calculation.

return_model

logical, if TRUE, the regression model will be returned; Otherwise, just return basic statistics from the model. Default is FALSE.

use_seed

integer, the random seed. Default is 999.

verbose

logical, if TRUE, print out additional information during calculation. Default is FALSE.

Details

It is a core function inside getDE.BID.2G. This function allows users to have access to more options when calculating the statistics using Bayesian Inference method. In some cases, the input expression matrix could be at probe/transcript level, but DE/DA calculated at gene level is expected. By setting pooling strategy, users can successfully solve the special cases. The P-value is estimated by the posterior distribution of the coefficient.

Value

Return a one-row data frame with calculated statistics for one gene/gene set if return_model is FALSE. Otherwise, the regression model will be returned.

Examples

mat <- matrix(c(0.50099,1.2108,1.0524,-0.34881,-0.13441,-0.87112,
                1.84579,2.0356,2.6025,1.62954,1.88281,1.29604),
                nrow=2,byrow=TRUE)
rownames(mat) <- c('A1','A2')
colnames(mat) <- c('Case-rep1','Case-rep2','Case-rep3',
                  'Control-rep1','Control-rep2','Control-rep3')
res1 <- bid(mat=mat,
           use_obs_class = c(rep('Case',3),rep('Control',3)),
           class_order = c('Control','Case'))
## Not run: 


jyyulab/NetBID documentation built on Dec. 23, 2024, 6:34 a.m.