RegenrichSet: RegenrichSet object creator

Description Usage Arguments Value Examples

View source: R/regenrichClasses.R

Description

This is 'RegenrichSet' object creator function. There are four types of parameters in this function.
First, parameters to provide raw data and sample information;
'expr' and 'colData'.

Second, parameters to perform differential expression analysis;
'method', 'minMeanExpr', 'design', 'reduced', 'contrast', 'coef', 'name', 'fitType', 'sfType', 'betaPrior', 'minReplicatesForReplace', 'useT', 'minmu', 'parallel', 'BPPARAM' (also for network inference), 'altHypothesis', 'listValues', 'cooksCutoff', 'independentFiltering', 'alpha', 'filter', 'theta', 'filterFun', 'addMLE', 'blind', 'ndups', 'spacing', 'block', 'correlation', 'weights', 'proportion', 'stdev.coef.lim', 'trend', 'robust', and 'winsor.tail.p'.

Thrid, parameters to perform regulator-target network inference;
'reg', 'networkConstruction', 'topNetPercent', 'directed', 'rowSample', 'softPower', 'networkType', 'TOMDenom', 'RsquaredCut', 'edgeThreshold', 'K', 'nbTrees', 'importanceMeasure', 'trace', 'BPPARAM' (also for differential expression analysis), and 'minR'.

Fourth, parameters to perform enrichment analysis:
'enrichTest', 'namedScoresCutoffs', 'minSize', 'maxSize', 'pvalueCutoff', 'qvalueCutoff', 'regAltName', 'universe', and 'nperm'.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
RegenrichSet(
  expr,
  colData,
  rowData = NULL,
  method = c("Wald_DESeq2", "LRT_DESeq2", "limma", "LRT_LM"),
  minMeanExpr = NULL,
  design,
  reduced,
  contrast,
  coef = NULL,
  name,
  fitType = c("parametric", "local", "mean"),
  sfType = c("ratio", "poscounts", "iterate"),
  betaPrior,
  minReplicatesForReplace = 7,
  useT = FALSE,
  minmu = 0.5,
  parallel = FALSE,
  BPPARAM = bpparam(),
  altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"),
  listValues = c(1, -1),
  cooksCutoff,
  independentFiltering = TRUE,
  alpha = 0.1,
  filter,
  theta,
  filterFun,
  addMLE = FALSE,
  blind = FALSE,
  ndups = 1,
  spacing = 1,
  block = NULL,
  correlation,
  weights = NULL,
  proportion = 0.01,
  stdev.coef.lim = c(0.1, 4),
  trend = FALSE,
  robust = FALSE,
  winsor.tail.p = c(0.05, 0.1),
  reg = TFs$TF_name,
  networkConstruction = c("COEN", "GRN", "new"),
  topNetPercent = 5,
  directed = FALSE,
  rowSample = FALSE,
  softPower = NULL,
  networkType = "unsigned",
  TOMDenom = "min",
  RsquaredCut = 0.85,
  edgeThreshold = NULL,
  K = "sqrt",
  nbTrees = 1000,
  importanceMeasure = "IncNodePurity",
  trace = FALSE,
  minR = 0.3,
  enrichTest = c("FET", "GSEA"),
  namedScoresCutoffs = 0.05,
  minSize = 5,
  maxSize = 5000,
  pvalueCutoff = 0.05,
  qvalueCutoff = 0.2,
  regAltName = NULL,
  universe = NULL,
  nperm = 10000
)

Arguments

expr

matrix or data.frame, expression profile of a set of genes or a set of proteins. If the method = 'Wald_DESeq2' or 'LRT_DESeq2' only non-negative integer matrix (read counts by RNA sequencing) is accepted.

colData

data frame, sample phenotype data. The rows of colData must correspond to the columns of expr.

rowData

NULL or data frame, information of each row/gene. Default is NULL, which will generate a DataFrame of three columns, i.e., "gene", "p", and "logFC".

method

either 'Wald_DESeq2', 'LRT_DESeq2', 'limma', or 'LRT_LM' for the differential expression analysis.

  • When method = 'Wald_DESeq2', the Wald test in DESeq2 package is used;

  • When method = 'LRT_DESeq2', the likelihood ratio test (LRT) in DESeq2 package is used;

  • When method = 'limma', the 'ls' method and empirical Bayes method in limma package are used to calculate moderated t-statistics and differential p-values;

  • When method = 'LRT_LM', a likelihood ratio test is performed for each row of 'expr' to compare two linear model specified by 'design' and 'reduced' arguments. In this case, the fold changes are not calculated but set to 0.

minMeanExpr

numeric, the cutoff of gene average expression for pre-filtering. The rows of 'expr' with everage expression < minMeanExpr is removed. The higher 'minMeanExpr' is, the more genes are not included for testing.

design

either model formula or model matrix. For method = 'LRT_DESeq2' or 'LRT_LM', the design is the full model formula/matrix. For method = 'limma', and if design is a formula, the model matrix is constructed using model.matrix(design, colData), so the name of each term in the design formula must be included in the column names of 'colData'.

reduced

The argument is used only when method = 'LRT_DESeq2' or 'LRT_LM', it is a reduced formula/matrix to compare against. If the design is a model matrix, 'reduced' must also be a model matrix.

contrast

The argument is used only when method = 'LRT_DESeq2', 'Wald_DESeq2', or 'limma'.
When method = 'LRT_DESeq2', or 'Wald_DESeq2', it specifies what comparison to extract from the 'DESeqDataSet' object to build a results table (when method = 'LRT_DESeq2', this does not affect the value of 'stat', 'pvalue', or 'padj').
It can be one of following three formats:

  • a character vector with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change;

  • a list of 1 or 2 character vector(s): the first element specifies the names of the fold changes for the numerator, and the second element (optional) specifies the names of the fold changes for the denominator. These names should be elements of getResultsNames(design, colData);

  • a numeric contrast vector with one element for each element in getResultsNames(design, colData).

When method = 'limma', It can be one of following two formats:

  • a numeric matrix with rows corresponding to coefficients in design matrix and columns containing contrasts;

  • a numeric vector if there is only one contrast. Each element of the vector corresponds to coefficients in design matrix. This is similar to the third format of contrast when method = 'LRT_DESeq2', or 'Wald_DESeq2'.

coef

The argument is used only when method = 'limma'. (Vector of) column number or column name specifying which coefficient or contrast of the linear model is of interest. Default is NULL.

name

The argument is used only when method = 'LRT_DESeq2' or 'Wald_DESeq2'. the name of the individual effect (coefficient) for building a results table. Use this argument rather than contrast for continuous variables, individual effects or for individual interaction terms. The value provided to name must be an element of getResultsNames(design, colData).

fitType

either 'parametric', 'local', or 'mean' for the type of fitting of dispersions to the mean intensity. This argument is used only when method = 'Wald_DESeq2' or 'LRT_DESeq2'. See DESeq from DESeq2 package for more details. Default is 'parametric'.

sfType

either 'ratio', 'poscounts', or 'iterate' for the type of size factor estimation. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See DESeq from DESeq2 package for more details. Default is 'ratio'.

betaPrior

This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See DESeq from DESeq2 package for more details.

minReplicatesForReplace

This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See DESeq from DESeq2 package for more details. Default is 7.

useT

This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See DESeq from DESeq2 package for more details. Default is FALSE,

minmu

This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See DESeq from DESeq2 package for more details. Default is 0.5.

parallel

whether computing (only for differential analysis with method = "Wald_DESeq2" or "LRT_DESeq2") is parallel (default is FALSE).

BPPARAM

parameters for parallel computing (default is bpparam()).

altHypothesis

= c('greaterAbs', 'lessAbs', 'greater', 'less'). This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See results from DESeq2 package for more details. Default is 'greaterAbs'.

listValues

This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See results from DESeq2 package for more details. Default is c(1, -1),

cooksCutoff

theshold on Cook's distance, such that if one or more samples for a row have a distance higher, the p-value for the row is set to NA. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See results from DESeq2 package for more details.

independentFiltering

logical, whether independent filtering should be applied automatically. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See results from DESeq2 package for more details. Default is TRUE.

alpha

the significance cutoff used for optimizing the independent filtering. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See results from DESeq2 package for more details. Default is 0.1,

filter

the vector of filter statistics over which the independent filtering is optimized. By default the mean of normalized counts is used. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See results from DESeq2 package for more details.

theta

the quantiles at which to assess the number of rejections from independent filtering. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See results from DESeq2 package for more details.

filterFun

an optional custom function for performing independent filtering and p-value adjustment. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See results from DESeq2 package for more details.

addMLE

if betaPrior=TRUE was used, whether the 'unshrunken' maximum likelihood estimates (MLE) of log2 fold change should be added as a column to the results table. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See results from DESeq2 package for more details. Default is FALSE.

blind

logical, whether to blind the transformation to the experimental design. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See vst from DESeq2 package for more details. Default is FALSE, which is different from the default of vst function.

ndups

positive integer giving the number of times each distinct probe is printed on each array. This argument is used only when method = 'limma'. See lmFit from limma package for more details. Default is 1.

spacing

positive integer giving the spacing between duplicate occurrences of the same probe, spacing=1 for consecutive rows. This argument is used only when method = 'limma'. See lmFit from limma package for more details. Default is 1.

block

vector or factor specifying a blocking variable on the arrays. Has length equal to the number of arrays. Must be NULL if ndups > 2. This argument is used only when method = 'limma'. See lmFit from limma package for more details. Default is NULL.

correlation

the inter-duplicate or inter-technical replicate correlation. The correlation value should be estimated using the duplicateCorrelation function. This argument is used only when method = 'limma'. See lmFit from limma package for more details.

weights

non-negative precision weights. Can be a numeric matrix of individual weights of same size as the object expression matrix, or a numeric vector of array weights with length equal to ncol of the expression matrix, or a numeric vector of gene weights with length equal to nrow of the expression matrix. This argument is used only when method = 'limma' or 'LRT_LM'. See lmFit from limma package for more details. Default is NULL.

proportion

numeric value between 0 and 1, assumed proportion of genes which are differentially expressed. This argument is used only when method = 'limma'. See eBayes from limma package for more details. Default is 0.01.

stdev.coef.lim

numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes. This argument is used only when method = 'limma'. See eBayes from limma package for more details. Default is c(0.1, 4).

trend

logical, should an intensity-trend be allowed for the prior variance? This argument is used only when method = 'limma'. See eBayes from limma package for more details. Default is FALSE, meaning that the prior variance is constant.

robust

logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances? This argument is used only when method = 'limma'. See eBayes from limma package for more details. Default is FALSE.

winsor.tail.p

numeric vector of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when method = 'limma' and robust=TRUE. See eBayes from limma package for more details. Default is c(0.05,0.1)

reg

a vector of regulator names (ID). By default, these are transcription (co-)factors defined by three literatures/databases, namely RegNet, TRRUST, and Marbach2016. The type (for example ENSEMBL gene ID, Entrez gene ID, or gene symble/name) of names or IDs of these regulators must be the same as the type of names or IDs in the regulator-target network.

networkConstruction

the method to construct this network. Possible can be:
'COEN', coexpression network;
'GRN', gene regulatory network by random forest;
'new' (default), meaning a network provided by user, rather than infered based on the expression data.

topNetPercent

numeric, what percentage of the top edges in the full network is ratained. Default is 5, meaning top 5% of edges. This value must be between 0 and 100.

directed

logical, whether the network is directed. Default is FALSE.

rowSample

logic, if TRUE, each row represents a sample. Otherwise, each column represents a sample. Default is FALSE.

softPower

numeric, a soft power to achieve scale free topology. If not provided, the parameter will be picked automatically by plotSoftPower function.

networkType

network type. Allowed values are (unique abbreviations of) 'unsigned' (default), 'signed', 'signed hybrid'. See adjacency.

TOMDenom

a character string specifying the TOM variant to be used. Recognized values are 'min' giving the standard TOM described in Zhang and Horvath (2005), and 'mean' in which the min function in the denominator is replaced by mean. The 'mean' may produce better results but at this time should be considered experimental.

RsquaredCut

desired minimum scale free topology fitting index R^2. Default is 0.85.

edgeThreshold

numeric, the threshold to remove the low weighted edges, Default is NULL, which means no edges will be removed.

K

integer or character. The number of features in each tree, can be either a integer number, 'sqrt', or 'all'. 'sqrt' denotes sqrt(the number of 'reg'), 'all' means the number of 'reg'. Default is 'sqrt'.

nbTrees

integer. The number of trees. Default is 1000.

importanceMeasure

character. importanceMeasure can be '%IncMSE' or 'IncNodePurity', corresponding to type = 1 and 2 in importance function, respectively. Default is 'IncNodePurity'(decrease in node impurity), which is faster than '%IncMSE' (decrease in accuracy).

trace

logical. To show the progress or not (default).

minR

numeric. The minimum correlation coefficient of prediction is to control model accuracy. Default is 0.3.

enrichTest

character, specifying the enrichment analysis method, which is either ‘FET' (Fisher’s exact test) or 'GSEA' (gene set enrichment analysis).

namedScoresCutoffs

numeric, the significance cutoff for the differential analysis p value. Default is 0.05.

minSize

The minimum number (default 5) of target genes.

maxSize

The maximum number (default 5000) of target genes.

pvalueCutoff

numeric, the significance cutoff for adjusted enrichment p value. This is used for obtaining the 'topResult' slot in the final 'Enrich' object. Default is 0.05.

qvalueCutoff

numeric, the significance cutoff of enrichment q-value. Default is 0.2.

regAltName

alternative name for regulator. Default is NULL.

universe

a vector of charactors. Background target genes.

nperm

integer, number of permutations. The minimial possible nominal p-value is about 1/nperm. Default is 10000.

Value

an object of RegenrichSet class.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')
object

RegEnrich documentation built on March 7, 2021, 2 a.m.