GSEPD_INIT: Initialization

View source: R/GSEPD_INIT.R

GSEPD_INITR Documentation

Initialization

Description

Initializes the system, here you will pass in the count dataset and the sample metadata, before any GSEPD processing. Return value is a named list holding configurable parameters.

Usage

GSEPD_INIT(Output_Folder = "OUT", finalCounts = NULL, sampleMeta = NULL,
DESeqDataSet = NULL, renormalize = TRUE, vstBlind=TRUE,
  COLORS = c("green", "gray", "red"),
  C2T = "x" )

Arguments

Output_Folder

Specify the subdirectory to hold output/generated files. Defaults to "OUT".

finalCounts

This must be a matrix of count data, rows are transcript IDs and columns are samples.

sampleMeta

The sampleMeta matrix must be passed here. It is a data frame with a row for each sample in the finalCounts matrix. Some required columns are SHORTNAME= sample nicknames; Condition= treatment group for differential expression; and Sample are the column names of finalCounts. Other columns are permitted to facilitate subsetting (not automatically supported).

DESeqDataSet

Data may also be included in the format of a DESeqDataSet object, this is mutually exclusive of the finalCounts/sampleMeta scheme.

renormalize

Boolean performance flag. Default is TRUE, which causes a normalized counts table to be computed from your given raw reads 'finalCounts'. If you set this to FALSE, then the normCounts table is preloaded with the given finalCounts input matrix, short circuiting the built-in DESeq VST, and allowing the user to specify some sort of pre-normalized dataset.

vstBlind

Exposes the option from DESeq2 to change the way varianceStabilizingTransformation works. According to the DESeq manual: blind=TRUE should be used for comparing samples in an manner unbiased by prior information on samples ... If many of genes have large differences in counts due to the experimental design, it is important to set blind=FALSE for downstream analysis.

COLORS

A three element vector of colors to make the heatmaps, the first element is the under-expressed genes, and the third element is the over-expressed genes. Defaults to green-red through gray.

C2T

This symbol is used in the filenames to delimit sample groups.

Details

This function sets up the master parameter object, and therefore must be called first. This object includes all configurable parameters you can change before running the pipeline. Count data should be provided in the finalCounts matrix, with phenotype and sample data in the sampleMeta matrix. Optionally, these data may be packages in a DESeqDataSet instead. Rows with no expression are dropped at the point of loading.

Value

Returns the GSEPD named list master object, to be used in subsequent function calls.

See Also

GSEPD_Process

Examples

  data("IlluminaBodymap")
  data("IlluminaBodymapMeta")
  isoform_ids <- Name_to_RefSeq(c("HIF1A","EGFR","MYH7","CD33","BRCA2"))
  rows_of_interest <- unique( c( isoform_ids ,
                                 sample(rownames(IlluminaBodymap),
                                        size=2000,replace=FALSE)))
  G <- GSEPD_INIT(Output_Folder="OUT",
                finalCounts=round(IlluminaBodymap[rows_of_interest , ]),
                sampleMeta=IlluminaBodymapMeta,
                COLORS=c("green","black","red"))   
  #now ready to run:
  # G<-GSEPD_ProcessAll(G);
  

kstammits/rgsepd documentation built on Oct. 13, 2022, 6:43 p.m.