AMARETTO_Initialize: AMARETTO_Initialize (version: reorder and filter MA_Matrix)

Description Usage Arguments Value Examples

View source: R/amaretto_functions.R

Description

Code used to initialize the seed clusters for an AMARETTO run. Requires processed gene expressiosn (rna-seq or microarray), CNV (usually from a GISTIC run), and methylation (from MethylMix, provided in this package) data. Uses the function CreateRegulatorData() and results are fed into the function AMARETTO_Run().

Usage

1
2
3
4
AMARETTO_Initialize(MA_matrix = MA_matrix, CNV_matrix = NULL,
  MET_matrix = NULL, Driver_list = NULL, NrModules, VarPercentage,
  PvalueThreshold = 0.001, RsquareThreshold = 0.1, pmax = 10,
  NrCores = 1, OneRunStop = 0, method = "union")

Arguments

MA_matrix

Expression matrix, with genes in rows and samples in columns.

CNV_matrix

CNV matrix, with genes in rows and samples in columns.

MET_matrix

Methylation matrix, with genes in rows and samples in columns.

Driver_list

Custom list of driver genes to be considered in analysis

NrModules

How many gene co-expression modules should AMARETTO search for? Usually around 100 is acceptable, given the large number of possible driver-passenger gene combinations.

VarPercentage

Minimum percentage by variance for filtering of genes; for example, 75% would indicate that the CreateRegulatorData() function only analyses genes that have a variance above the 75th percentile across all samples.

PvalueThreshold

Threshold used to find relevant driver genes with CNV alterations: maximal p-value.

RsquareThreshold

Threshold used to find relevant driver genes with CNV alterations: minimal R-square value between CNV and gene expression data.

pmax

'pmax' variable for glmnet function from glmnet package; the maximum number of variables aver to be nonzero. Should not be changed by user unless she/he fully understands the AMARETTO algorithm and how its parameters choices affect model output.

NrCores

A numeric variable indicating the number of computer/server cores to use for paralellelization. Default is 1, i.e. no parallelization. Please check your computer or server's computing capacities before increasing this number. Parallelization is done via the RParallel package. Mac vs. Windows environments may behave differently when using parallelization.

OneRunStop

OneRunStop

method

Perform union or intersection of the driver genes evaluated from the input data matrices and custom driver gene list provided.

Value

result

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
data('ProcessedDataLIHC')
data('Driver_Genes')
AMARETTOinit <- AMARETTO_Initialize(MA_matrix = ProcessedDataLIHC$MA_matrix,
                                    CNV_matrix = ProcessedDataLIHC$CNV_matrix,
                                    MET_matrix = ProcessedDataLIHC$MET_matrix,
                                    NrModules = 20, VarPercentage = 60)
## Not run: 
AMARETTOinit <- AMARETTO_Initialize(MA_matrix = ProcessedDataLIHC$MA_matrix,
                                    CNV_matrix = NULL,
                                    MET_matrix = ProcessedDataLIHC$MET_matrix,
                                    Driver_list = Driver_Genes[['MSigDB']],
                                    NrModules = 20, VarPercentage = 60)

## End(Not run)

gevaertlab/AMARETTO documentation built on Feb. 19, 2019, 4:15 a.m.