RegenrichSet: RegenrichSet object creator
In WTaoUMC/RegEnrich: Gene regulator enrichment analysis

Description Usage Arguments Value Examples

This is 'RegenrichSet' object creator function. There are four types of parameters in this function.
First, parameters to provide raw data and sample information;
'expr' and 'colData'.

Second, parameters to perform differential expression analysis;
'method', 'minMeanExpr', 'design', 'reduced', 'contrast', 'coef', 'name', 'fitType', 'sfType', 'betaPrior', 'minReplicatesForReplace', 'useT', 'minmu', 'parallel', 'BPPARAM' (also for network inference), 'altHypothesis', 'listValues', 'cooksCutoff', 'independentFiltering', 'alpha', 'filter', 'theta', 'filterFun', 'addMLE', 'blind', 'ndups', 'spacing', 'block', 'correlation', 'weights', 'proportion', 'stdev.coef.lim', 'trend', 'robust', and 'winsor.tail.p'.

Thrid, parameters to perform regulator-target network inference;
'reg', 'networkConstruction', 'topNetPercent', 'directed', 'rowSample', 'softPower', 'networkType', 'TOMDenom', 'RsquaredCut', 'edgeThreshold', 'K', 'nbTrees', 'importanceMeasure', 'trace', 'BPPARAM' (also for differential expression analysis), and 'minR'.

Fourth, parameters to perform enrichment analysis:
'enrichTest', 'namedScoresCutoffs', 'minSize', 'maxSize', 'pvalueCutoff', 'qvalueCutoff', 'regAltName', 'universe', and 'nperm'.

RegenrichSet(
  expr,
  colData,
  rowData = NULL,
  method = c("Wald_DESeq2", "LRT_DESeq2", "limma", "LRT_LM"),
  minMeanExpr = NULL,
  design,
  reduced,
  contrast,
  coef = NULL,
  name,
  fitType = c("parametric", "local", "mean"),
  sfType = c("ratio", "poscounts", "iterate"),
  betaPrior,
  minReplicatesForReplace = 7,
  useT = FALSE,
  minmu = 0.5,
  parallel = FALSE,
  BPPARAM = bpparam(),
  altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"),
  listValues = c(1, -1),
  cooksCutoff,
  independentFiltering = TRUE,
  alpha = 0.1,
  filter,
  theta,
  filterFun,
  addMLE = FALSE,
  blind = FALSE,
  ndups = 1,
  spacing = 1,
  block = NULL,
  correlation,
  weights = NULL,
  proportion = 0.01,
  stdev.coef.lim = c(0.1, 4),
  trend = FALSE,
  robust = FALSE,
  winsor.tail.p = c(0.05, 0.1),
  reg = TFs$TF_name,
  networkConstruction = c("COEN", "GRN", "new"),
  topNetPercent = 5,
  directed = FALSE,
  rowSample = FALSE,
  softPower = NULL,
  networkType = "unsigned",
  TOMDenom = "min",
  RsquaredCut = 0.85,
  edgeThreshold = NULL,
  K = "sqrt",
  nbTrees = 1000,
  importanceMeasure = "IncNodePurity",
  trace = FALSE,
  minR = 0.3,
  enrichTest = c("FET", "GSEA"),
  namedScoresCutoffs = 0.05,
  minSize = 5,
  maxSize = 5000,
  pvalueCutoff = 0.05,
  qvalueCutoff = 0.2,
  regAltName = NULL,
  universe = NULL,
  nperm = 10000
)

`expr`	matrix or data.frame, expression profile of a set of genes or a set of proteins. If the `method = 'Wald_DESeq2' or 'LRT_DESeq2'` only non-negative integer matrix (read counts by RNA sequencing) is accepted.
`colData`	data frame, sample phenotype data. The rows of colData must correspond to the columns of expr.
`rowData`	NULL or data frame, information of each row/gene. Default is NULL, which will generate a DataFrame of three columns, i.e., "gene", "p", and "logFC".
`method`	either 'Wald_DESeq2', 'LRT_DESeq2', 'limma', or 'LRT_LM' for the differential expression analysis. When method = 'Wald_DESeq2', the Wald test in DESeq2 package is used; When method = 'LRT_DESeq2', the likelihood ratio test (LRT) in DESeq2 package is used; When method = 'limma', the 'ls' method and empirical Bayes method in limma package are used to calculate moderated t-statistics and differential p-values; When method = 'LRT_LM', a likelihood ratio test is performed for each row of 'expr' to compare two linear model specified by 'design' and 'reduced' arguments. In this case, the fold changes are not calculated but set to 0.
`minMeanExpr`	numeric, the cutoff of gene average expression for pre-filtering. The rows of 'expr' with everage expression < minMeanExpr is removed. The higher 'minMeanExpr' is, the more genes are not included for testing.
`design`	either model formula or model matrix. For method = 'LRT_DESeq2' or 'LRT_LM', the design is the full model formula/matrix. For method = 'limma', and if design is a formula, the model matrix is constructed using model.matrix(design, colData), so the name of each term in the design formula must be included in the column names of 'colData'.
`reduced`	The argument is used only when method = 'LRT_DESeq2' or 'LRT_LM', it is a reduced formula/matrix to compare against. If the design is a model matrix, 'reduced' must also be a model matrix.
`contrast`	The argument is used only when method = 'LRT_DESeq2', 'Wald_DESeq2', or 'limma'. When method = 'LRT_DESeq2', or 'Wald_DESeq2', it specifies what comparison to extract from the 'DESeqDataSet' object to build a results table (when method = 'LRT_DESeq2', this does not affect the value of 'stat', 'pvalue', or 'padj'). It can be one of following three formats: a character vector with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change; a list of 1 or 2 character vector(s): the first element specifies the names of the fold changes for the numerator, and the second element (optional) specifies the names of the fold changes for the denominator. These names should be elements of `getResultsNames(design, colData)`; a numeric contrast vector with one element for each element in `getResultsNames(design, colData)`. When method = 'limma', It can be one of following two formats: a numeric matrix with rows corresponding to coefficients in design matrix and columns containing contrasts; a numeric vector if there is only one contrast. Each element of the vector corresponds to coefficients in design matrix. This is similar to the third format of contrast when method = 'LRT_DESeq2', or 'Wald_DESeq2'.
`coef`	The argument is used only when method = 'limma'. (Vector of) column number or column name specifying which coefficient or contrast of the linear model is of interest. Default is NULL.
`name`	The argument is used only when method = 'LRT_DESeq2' or 'Wald_DESeq2'. the name of the individual effect (coefficient) for building a results table. Use this argument rather than contrast for continuous variables, individual effects or for individual interaction terms. The value provided to name must be an element of `getResultsNames(design, colData)`.
`fitType`	either 'parametric', 'local', or 'mean' for the type of fitting of dispersions to the mean intensity. This argument is used only when method = 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is 'parametric'.
`sfType`	either 'ratio', 'poscounts', or 'iterate' for the type of size factor estimation. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is 'ratio'.
`betaPrior`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details.
`minReplicatesForReplace`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is 7.
`useT`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is FALSE,
`minmu`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `DESeq` from DESeq2 package for more details. Default is 0.5.
`parallel`	whether computing (only for differential analysis with method = "Wald_DESeq2" or "LRT_DESeq2") is parallel (default is FALSE).
`BPPARAM`	parameters for parallel computing (default is `bpparam()`).
`altHypothesis`	= c('greaterAbs', 'lessAbs', 'greater', 'less'). This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is 'greaterAbs'.
`listValues`	This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is c(1, -1),
`cooksCutoff`	theshold on Cook's distance, such that if one or more samples for a row have a distance higher, the p-value for the row is set to NA. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details.
`independentFiltering`	logical, whether independent filtering should be applied automatically. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is TRUE.
`alpha`	the significance cutoff used for optimizing the independent filtering. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is 0.1,
`filter`	the vector of filter statistics over which the independent filtering is optimized. By default the mean of normalized counts is used. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details.
`theta`	the quantiles at which to assess the number of rejections from independent filtering. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details.
`filterFun`	an optional custom function for performing independent filtering and p-value adjustment. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details.
`addMLE`	if betaPrior=TRUE was used, whether the 'unshrunken' maximum likelihood estimates (MLE) of log2 fold change should be added as a column to the results table. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `results` from DESeq2 package for more details. Default is FALSE.
`blind`	logical, whether to blind the transformation to the experimental design. This argument is used only when method = either 'Wald_DESeq2' or 'LRT_DESeq2'. See `vst` from DESeq2 package for more details. Default is FALSE, which is different from the default of vst function.
`ndups`	positive integer giving the number of times each distinct probe is printed on each array. This argument is used only when method = 'limma'. See `lmFit` from limma package for more details. Default is 1.
`spacing`	positive integer giving the spacing between duplicate occurrences of the same probe, spacing=1 for consecutive rows. This argument is used only when method = 'limma'. See `lmFit` from limma package for more details. Default is 1.
`block`	vector or factor specifying a blocking variable on the arrays. Has length equal to the number of arrays. Must be NULL if ndups > 2. This argument is used only when method = 'limma'. See `lmFit` from limma package for more details. Default is NULL.
`correlation`	the inter-duplicate or inter-technical replicate correlation. The correlation value should be estimated using the `duplicateCorrelation` function. This argument is used only when method = 'limma'. See `lmFit` from limma package for more details.
`weights`	non-negative precision weights. Can be a numeric matrix of individual weights of same size as the object expression matrix, or a numeric vector of array weights with length equal to ncol of the expression matrix, or a numeric vector of gene weights with length equal to nrow of the expression matrix. This argument is used only when method = 'limma' or 'LRT_LM'. See `lmFit` from limma package for more details. Default is NULL.
`proportion`	numeric value between 0 and 1, assumed proportion of genes which are differentially expressed. This argument is used only when method = 'limma'. See `eBayes` from limma package for more details. Default is 0.01.
`stdev.coef.lim`	numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes. This argument is used only when method = 'limma'. See `eBayes` from limma package for more details. Default is c(0.1, 4).
`trend`	logical, should an intensity-trend be allowed for the prior variance? This argument is used only when method = 'limma'. See `eBayes` from limma package for more details. Default is FALSE, meaning that the prior variance is constant.
`robust`	logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances? This argument is used only when method = 'limma'. See `eBayes` from limma package for more details. Default is FALSE.
`winsor.tail.p`	numeric vector of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when method = 'limma' and robust=TRUE. See `eBayes` from limma package for more details. Default is c(0.05,0.1)
`reg`	a vector of regulator names (ID). By default, these are transcription (co-)factors defined by three literatures/databases, namely RegNet, TRRUST, and Marbach2016. The type (for example ENSEMBL gene ID, Entrez gene ID, or gene symble/name) of names or IDs of these regulators must be the same as the type of names or IDs in the regulator-target network.
`networkConstruction`	the method to construct this network. Possible can be: 'COEN', coexpression network; 'GRN', gene regulatory network by random forest; 'new' (default), meaning a network provided by user, rather than infered based on the expression data.
`topNetPercent`	numeric, what percentage of the top edges in the full network is ratained. Default is 5, meaning top 5% of edges. This value must be between 0 and 100.
`directed`	logical, whether the network is directed. Default is FALSE.
`rowSample`	logic, if TRUE, each row represents a sample. Otherwise, each column represents a sample. Default is FALSE.
`softPower`	numeric, a soft power to achieve scale free topology. If not provided, the parameter will be picked automatically by `plotSoftPower` function.
`networkType`	network type. Allowed values are (unique abbreviations of) 'unsigned' (default), 'signed', 'signed hybrid'. See `adjacency`.
`TOMDenom`	a character string specifying the TOM variant to be used. Recognized values are 'min' giving the standard TOM described in Zhang and Horvath (2005), and 'mean' in which the min function in the denominator is replaced by mean. The 'mean' may produce better results but at this time should be considered experimental.
`RsquaredCut`	desired minimum scale free topology fitting index R^2. Default is 0.85.
`edgeThreshold`	numeric, the threshold to remove the low weighted edges, Default is NULL, which means no edges will be removed.
`K`	integer or character. The number of features in each tree, can be either a integer number, 'sqrt', or 'all'. 'sqrt' denotes sqrt(the number of 'reg'), 'all' means the number of 'reg'. Default is 'sqrt'.
`nbTrees`	integer. The number of trees. Default is 1000.
`importanceMeasure`	character. importanceMeasure can be '%IncMSE' or 'IncNodePurity', corresponding to type = 1 and 2 in `importance` function, respectively. Default is 'IncNodePurity'(decrease in node impurity), which is faster than '%IncMSE' (decrease in accuracy).
`trace`	logical. To show the progress or not (default).
`minR`	numeric. The minimum correlation coefficient of prediction is to control model accuracy. Default is 0.3.
`enrichTest`	character, specifying the enrichment analysis method, which is either ‘FET' (Fisher’s exact test) or 'GSEA' (gene set enrichment analysis).
`namedScoresCutoffs`	numeric, the significance cutoff for the differential analysis p value. Default is 0.05.
`minSize`	The minimum number (default 5) of target genes.
`maxSize`	The maximum number (default 5000) of target genes.
`pvalueCutoff`	numeric, the significance cutoff for adjusted enrichment p value. This is used for obtaining the 'topResult' slot in the final 'Enrich' object. Default is 0.05.
`qvalueCutoff`	numeric, the significance cutoff of enrichment q-value. Default is 0.2.
`regAltName`	alternative name for regulator. Default is NULL.
`universe`	a vector of charactors. Background target genes.
`nperm`	integer, number of permutations. The minimial possible nominal p-value is about 1/nperm. Default is 10000.

an object of RegenrichSet class.

# library(RegEnrich)
data("Lyme_GSE63085")
data("TFs")

data = log2(Lyme_GSE63085$FPKM + 1)
colData = Lyme_GSE63085$sampleInfo

# Take first 2000 rows for example
data1 = data[seq(2000), ]

design = model.matrix(~0 + patientID + week, data = colData)

# Initializing a 'RegenrichSet' object
object = RegenrichSet(expr = data1,
                      colData = colData,
                      method = 'limma', minMeanExpr = 0,
                      design = design,
                      contrast = c(rep(0, ncol(design) - 1), 1),
                      networkConstruction = 'COEN',
                      enrichTest = 'FET')
object