v_chmTranscript: Main function for module: CHM_T (Transcript)

Description Usage Arguments Details Value

View source: R/module_CHM_T.R

Description

Extract, clean and reform NGS data prepared by v_prepareVdata_CHM_T function, and then perform local Transcript/Variant/Isoform Analysis in addition to a set of in-house filtering process and other statistical analysis based on user's choice.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
v_chmTranscript(
  outputFolderPath = "./_VK/_CHM/",
  te.list.c = te.list.c,
  filterOutlier = FALSE,
  filterOutlier_fromGroup = grpName,
  filterStringency = 0.1,
  filterNoTPM = FALSE,
  significanceTest = FALSE,
  significanceTest_inputForm = "log10",
  significanceTest_fdrq = FALSE,
  shadowGroup = list(FALSE, 5),
  calculateFC = FALSE,
  log10Threshold = c(0.3, -0.3),
  rowSliceOrder = te_loader_par[["te_rowSlice"]],
  colSliceOrder = grpName,
  grpName_fc = grpName,
  grpName_pval = grpName,
  sortRowByEV = FALSE,
  sortColByEV = FALSE,
  addBpAnno = FALSE,
  addEvAnno = FALSE,
  unsupervisedClustering = FALSE,
  showCOMBIonly = TRUE
)

Arguments

outputFolderPath

string, relative or absolute path to the output folder, trailing slash "/" required, can be set to NULL (no output file will be written to the file system), default "./_VK/_CHM/" for module CHM_T.

te.list.c

main input, automatically extracted from the prepared data, can (or probably should) be omitted in function call.

filterOutlier

logic, whether to filter outliers and exclude them from downstream analysis.

filterOutlier_fromGroup

character vector, if "filterOutlier" is set to TRUE, filter outliers from the selected groups and ignore outliers from other groups. By default will use the internal character vector "grpName" and can be left unchanged.

filterStringency

numeric, the ratio that outliers can be allowed in each selected group, from 0 to 1, default 0.1 (10%), calculation will be rounded down to integer. For example, if there are 24 samples in a group and filterStringency is set to 0.1, the number of allowed outliers in this group will be 2. See Details for more information about how "filterOutlier" works.

filterNoTPM

logic, whether to filter out genes that have expression value = 0 in all samples, preferably TRUE to prevent error in significance test (error: data are essentially constant).

significanceTest

logic, whether to perform significance test, the type of test will be automatically adjustd based on the input data and other settings (e.g. number of groups).

significanceTest_inputForm

character, choose one from c("log10", "log2", "raw"), use log10, log2 or raw value to perform significance test. By default will use "log10" for module CHM_G, and preferably be left unchanged.

significanceTest_fdrq

logic, whether to calculate and display false discovery rate q-value, user-set value may be automatically modified (with notice) due to interplay.

shadowGroup

list(logic, numeric), [[1]] whether to create shadow/virtual group for each individual sample (grpName will be modified based on aliaseID); [[2]] number of shadow groups per sample. "shadowGroup" can only be used when each original group only contains one individual sample; only works for "significanceTest", will disable "calculateFC"; user-set value may be automatically modified (with notice) due to interplay.

calculateFC

logic, whether to calculate log10 fold-change value, user-set value may be automatically modified (with notice) due to interplay.

log10Threshold

numeric vector, upper and lower cutoff points in log10 form for deciding fold-change levels ("Up-Regulated", "With-Threshold", "Down-Regulated"). By default will use c(0.3, -0.3), which equal to the generally accepted fold-change threshold in non-log form (two-fold and half-fold, respectively).

rowSliceOrder

character vector, the order (from top to bottom) of genes (associated with transcripts) to be shown on the output plots and files. By default will use the internal character vector "te_loader_par[["te_rowSlice"]]" for module CHM_T, and preferably be left unchanged.

colSliceOrder

character vector, the order (from left to right) of groups to be shown on the output plots and files, should be set to the same value when called from different functions within the same module (e.g., v_prepareVdata_CHM_G and v_chmSignaturePanel). By default will use the internal character vector "grpName" and can be left unchanged. Groups not included in the "colSliceOrder" will also be excluded from the output plots and files (in most cases), and from certain analysis process (depending on the situation).

grpName_fc

character vector, two groups selected for fold-change calculation. Order matters, the first group will be used as the denominator, whereas the second as the nominator.

grpName_pval

character vector, groups selected for significance test if "significanceTest" is set to TRUE. Order does not matter, but number of groups may affect the type of test. If provided, should have at least two groups.

sortRowByEV

logic, whether to sort row (from top to bottom, within each row slice) by Expression Value (from high to low); currently not implemented.

sortColByEV

logic, whether to sort column (from left to right, within each column slice) by Expression Value (from high to low); currently not implemented.

addBpAnno

logic, whether to add boxplot annotation to the left of the main heatmap, user-set value may be automatically modified (with notice) due to interplay.

addEvAnno

logic, whether to add barplot annotation (Expression Value) to the top of the main heatmap, user-set value may be automatically modified (with notice) due to interplay.

unsupervisedClustering

logic, whether to perform unsupervised clustering (currently support Euclidean distance method), if TRUE, will override "rowSliceOrder" and "colSliceOrder", as well as disable "addBpAnno" and "addEvAnno".

showCOMBIonly

logic, whether to perform transcript analysis and show results for (all genes combined only) or (separate the analysis and results by each individual gene).

Details

Here is more information about how "filterOutlier" works. To begin with, vigilante takes the generally accepted definition of outliers: observations that lie outside 1.5 * IQR (Inter Quartile Range) of the 25th or 75th quartiles are regarded as outliers.

Assuming there is a project containing 100 samples which are divided into 3 groups (Group 1: 24, Group 2: 36, Group 3: 40). The data contains 10,000 observations per sample, and vigilante has detected outliers of observation A and B in the following: Group 1: A 3, B 2; Group 2: A 0, B 3; Group 3: A 0, B 4.

Under default "filterStringency" of 0.1, observation A will be excluded from downstream analysis, because A has 3 outliers in Group 1 which exceeds the number of allowed outliers in this group (24 * 0.1 = 2.4, rounded down to 2), even if there is no outlier of A in Group 2 or 3. On the other hand, though observation B has 2/3/4 outliers in Group 1/2/3, B will still be kept for downstream analysis because these numbers do not exceed the number of allowed outliers in their repective groups.

Moreover, if "filterStringency" is changed to 0.05, observation B will be excluded from downstream analysis as well; if "filterStringency" is changed to 0.15, observation A will no longer be excluded from downstream analysis.

Therefore, it is up to the user to decide whether or not to filter outliers, and how much stringency should be imposed on the filtering process.

Value

NULL, when a valid outputFolderPath is provided, analysis results and output plots will be generated and saved in the provided location, otherwise function run will stop and nothing will be written into the file system.


yilixu/vigilante documentation built on June 4, 2021, 5:07 a.m.