v_chmXcell: Main function for module: xCell

Description Usage Arguments Details Value

View source: R/module_xCell.R

Description

Extract, clean and reform NGS data prepared by v_prepareVdata_xCell function, and then perform local Cell Type Enrichment Analysis (based on xCell) in addition to a set of in-house filtering process and other statistical analysis based on user's choice.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
v_chmXcell(
  outputFolderPath = "./_VK/_xCell/",
  ge.xcell.result = ge.xcell.result,
  filterOutlier = FALSE,
  filterOutlier_fromGroup = grpName,
  filterStringency = 0.1,
  filterNoTPM = FALSE,
  significanceTest = FALSE,
  significanceTest_inputForm = "raw",
  significanceTest_fdrq = FALSE,
  shadowGroup = list(FALSE, 5),
  calculateFC = FALSE,
  log10Threshold = c(0.3, -0.3),
  rowSliceOrder = c("Up-Regulated", "Within-Threshold", "Down-Regulated"),
  colSliceOrder = grpName,
  grpName_fc = grpName,
  grpName_pval = grpName,
  addBpAnno = FALSE,
  unsupervisedClustering = FALSE,
  colorScheme = list(mode = "continuous", breakpoints = seq(0, 1, 0.2)),
  resizeColSlicer = FALSE,
  resizeColSlicer_width = NULL
)

Arguments

outputFolderPath

string, relative or absolute path to the output folder, trailing slash "/" required, can be set to NULL (no output file will be written to the file system), default "./_VK/_xCell/" for module xCell.

ge.xcell.result

main input, automatically extracted from the prepared data, can (or probably should) be omitted in function call.

filterOutlier

logic, whether to filter outliers and exclude them from downstream analysis.

filterOutlier_fromGroup

character vector, if "filterOutlier" is set to TRUE, filter outliers from the selected groups and ignore outliers from other groups. By default will use the internal character vector "grpName" and can be left unchanged.

filterStringency

numeric, the ratio that outliers can be allowed in each selected group, from 0 to 1, default 0.1 (10%), calculation will be rounded down to integer. For example, if there are 24 samples in a group and filterStringency is set to 0.1, the number of allowed outliers in this group will be 2. See Details for more information about how "filterOutlier" works.

filterNoTPM

logic, whether to filter out cell types that have Enrichment.Score = 0 in all samples, preferably TRUE to prevent error in significance test (error: data are essentially constant).

significanceTest

logic, whether to perform significance test, the type of test will be automatically adjustd based on the input data and other settings (e.g. number of groups).

significanceTest_inputForm

character, choose one from c("log10", "log2", "raw"), use log10, log2 or raw value to perform significance test. By default will use "raw" for module xCell, and preferably be left unchanged.

significanceTest_fdrq

logic, whether to calculate and display false discovery rate q-value, user-set value may be automatically modified (with notice) due to interplay.

shadowGroup

list(logic, numeric), [[1]] whether to create shadow/virtual group for each individual sample (grpName will be modified based on aliaseID); [[2]] number of shadow groups per sample. "shadowGroup" can only be used when each original group only contains one individual sample; only works for "significanceTest", will disable "calculateFC"; user-set value may be automatically modified (with notice) due to interplay.

calculateFC

logic, whether to calculate log10 fold-change value, user-set value may be automatically modified (with notice) due to interplay.

log10Threshold

numeric vector, upper and lower cutoff points in log10 form for deciding fold-change levels ("Up-Regulated", "With-Threshold", "Down-Regulated"). By default will use c(0.3, -0.3), which equal to the generally accepted fold-change threshold in non-log form (two-fold and half-fold, respectively).

rowSliceOrder

character vector, the order (from top to bottom) of fold-change levels (if applicable) to be shown on the output plots and files. By default will use c("Up-Regulated", "Within-Threshold", "Down-Regulated") for module xCell, and preferably be left unchanged.

colSliceOrder

character vector, the order (from left to right) of groups to be shown on the output plots and files, should be set to the same value when called from different functions within the same module (e.g., v_prepareVdata_xCell and v_chmXcell). By default will use the internal character vector "grpName" and can be left unchanged. Groups not included in the "colSliceOrder" will also be excluded from the output plots and files (in most cases), and from certain analysis process (depending on the situation).

grpName_fc

character vector, two groups selected for fold-change calculation. Order matters, the first group will be used as the denominator, whereas the second as the nominator.

grpName_pval

character vector, groups selected for significance test if "significanceTest" is set to TRUE. Order does not matter, but number of groups may affect the type of test. If provided, should have at least two groups.

addBpAnno

logic, whether to add boxplot annotation to the left of the main heatmap, user-set value may be automatically modified (with notice) due to interplay.

unsupervisedClustering

logic, whether to perform unsupervised clustering (currently support Euclidean distance method), if TRUE, will override "rowSliceOrder" and "colSliceOrder", as well as disable "addBpAnno".

colorScheme

list(character, numeric vector), set color scheme for heatmap mainbody, [[1]] mode, choose one from c("continuous", "discrete"); [[2]] breakpoints, should be a numeric vector containing breakpoints ranging from 0 to 1, inclusive; moreover, breakpoints are only used in "discrete" mode, and will be ignored in "continuous" mode. Recommended values are list(mode = "continuous", breakpoints = NULL) and list(mode = "discrete", breakpoints = seq(from = 0, to = 1, by = 0.2)). By default, will use the first recommended value, list(mode = "continuous", breakpoints = NULL).

resizeColSlicer

logic, whether to resize each column slicer shown on the heatmap, very useful when number of samples in different groups vary greatly (e.g. 10 samples in group A, 50 in group B, and 300 in group C). By default, size of each column slicer is proportional to its sample size, and thus user can use 'resizeColSlicer' to custom and balance the layout of all column slicers.

resizeColSlicer_width

integer vector, should be the same length as the number of column slicers (including additional statistical analysis results columns), and values provided here will be used in relative instead of absolute calculation. For example, there are 3 groups (A/B/C) with 10/50/300 samples in them, respectively. If there are 2 additional statistical analysis results columns, the final column slicers will be 5. Set 'resizeColSlicer_width' to rep(1, 5) will make all of the 5 column slicers the same size, but usually user may want the mainbody to be larger while the sidebar smaller, in this case, 'resizeColSlicer_width' can be set to c(rep(6, 3), rep(1, 2)) so that the 3 groups of the mainbody are in the same size while the additional statistical analysis results are only one-third of the mainbody's size.

Details

Here is more information about how "filterOutlier" works. To begin with, vigilante takes the generally accepted definition of outliers: observations that lie outside 1.5 * IQR (Inter Quartile Range) of the 25th or 75th quartiles are regarded as outliers.

Assuming there is a project containing 100 samples which are divided into 3 groups (Group 1: 24, Group 2: 36, Group 3: 40). The data contains 10,000 observations per sample, and vigilante has detected outliers of observation A and B in the following: Group 1: A 3, B 2; Group 2: A 0, B 3; Group 3: A 0, B 4.

Under default "filterStringency" of 0.1, observation A will be excluded from downstream analysis, because A has 3 outliers in Group 1 which exceeds the number of allowed outliers in this group (24 * 0.1 = 2.4, rounded down to 2), even if there is no outlier of A in Group 2 or 3. On the other hand, though observation B has 2/3/4 outliers in Group 1/2/3, B will still be kept for downstream analysis because these numbers do not exceed the number of allowed outliers in their repective groups.

Moreover, if "filterStringency" is changed to 0.05, observation B will be excluded from downstream analysis as well; if "filterStringency" is changed to 0.15, observation A will no longer be excluded from downstream analysis.

Therefore, it is up to the user to decide whether or not to filter outliers, and how much stringency should be imposed on the filtering process.

Value

NULL, when a valid outputFolderPath is provided, analysis results and output plots will be generated and saved in the provided location, otherwise function run will stop and nothing will be written into the file system.


yilixu/vigilante documentation built on June 4, 2021, 5:07 a.m.