run | R Documentation |
Function doing the actual analysis before calling the plotting functions.
run(
infercnv_obj,
cutoff = 1,
min_cells_per_gene = 3,
out_dir = NULL,
window_length = 101,
smooth_method = c("pyramidinal", "runmeans", "coordinates"),
num_ref_groups = NULL,
ref_subtract_use_mean_bounds = TRUE,
cluster_by_groups = TRUE,
cluster_references = TRUE,
k_obs_groups = 1,
hclust_method = "ward.D2",
max_centered_threshold = 3,
scale_data = FALSE,
HMM = FALSE,
HMM_transition_prob = 1e-06,
HMM_report_by = c("subcluster", "consensus", "cell"),
HMM_type = c("i6", "i3"),
HMM_i3_pval = 0.05,
HMM_i3_use_KS = FALSE,
BayesMaxPNormal = 0.5,
sim_method = "meanvar",
sim_foreground = FALSE,
reassignCNVs = TRUE,
analysis_mode = c("subclusters", "samples", "cells"),
tumor_subcluster_partition_method = c("leiden", "random_trees", "qnorm", "pheight",
"qgamma", "shc"),
tumor_subcluster_pval = 0.1,
k_nn = 20,
leiden_method = c("PCA", "simple"),
leiden_function = c("CPM", "modularity"),
leiden_resolution = "auto",
leiden_method_per_chr = c("simple", "PCA"),
leiden_function_per_chr = c("modularity", "CPM"),
leiden_resolution_per_chr = 1,
per_chr_hmm_subclusters = FALSE,
per_chr_hmm_subclusters_references = FALSE,
z_score_filter = 0.8,
denoise = FALSE,
noise_filter = NA,
sd_amplifier = 1.5,
noise_logistic = FALSE,
outlier_method_bound = "average_bound",
outlier_lower_bound = NA,
outlier_upper_bound = NA,
final_scale_limits = NULL,
final_center_val = NULL,
debug = FALSE,
num_threads = 4,
plot_steps = FALSE,
inspect_subclusters = TRUE,
resume_mode = TRUE,
png_res = 300,
plot_probabilities = TRUE,
save_rds = TRUE,
save_final_rds = TRUE,
diagnostics = FALSE,
remove_genes_at_chr_ends = FALSE,
prune_outliers = FALSE,
mask_nonDE_genes = FALSE,
mask_nonDE_pval = 0.05,
test.use = "wilcoxon",
require_DE_all_normals = "any",
hspike_aggregate_normals = FALSE,
no_plot = FALSE,
no_prelim_plot = FALSE,
write_expr_matrix = FALSE,
write_phylo = FALSE,
output_format = "png",
plot_chr_scale = FALSE,
chr_lengths = NULL,
useRaster = TRUE,
up_to_step = 100
)
infercnv_obj |
An infercnv object populated with raw count data |
cutoff |
Cut-off for the min average read counts per gene among reference cells. (default: 1) |
min_cells_per_gene |
minimum number of reference cells requiring expression measurements to include the corresponding gene. default: 3 |
out_dir |
path to directory to deposit outputs (default: NULL, required to provide non NULL) ## Smoothing params |
window_length |
Length of the window for the moving average (smoothing). Should be an odd integer. (default: 101)#' |
smooth_method |
Method to use for smoothing: c(runmeans,pyramidinal,coordinates) default: pyramidinal ##### |
num_ref_groups |
The number of reference groups or a list of indices for each group of reference indices in relation to reference_obs. (default: NULL) |
ref_subtract_use_mean_bounds |
Determine means separately for each ref group, then remove intensities within bounds of means (default: TRUE) Otherwise, uses mean of the means across groups. ############################# |
cluster_by_groups |
If observations are defined according to groups (ie. patients), each group of cells will be clustered separately. (default=FALSE, instead will use k_obs_groups setting) |
cluster_references |
Whether to cluster references within their annotations or not. (dendrogram not displayed) (default: TRUE) |
k_obs_groups |
Number of groups in which to break the observations. (default: 1) |
hclust_method |
Method used for hierarchical clustering of cells. Valid choices are: "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid". default("ward.D2") |
max_centered_threshold |
The maximum value a value can have after centering. Also sets a lower bound of -1 * this value. (default: 3), can set to a numeric value or "auto" to bound by the mean bounds across cells. Set to NA to turn off. |
scale_data |
perform Z-scaling of logtransformed data (default: FALSE). This may be turned on if you have very different kinds of data for your normal and tumor samples. For example, you need to use GTEx representative normal expression profiles rather than being able to leverage normal single cell data that goes with your experiment. ######################################################################### ## Downstream Analyses (HMM or non-DE-masking) based on tumor subclusters |
HMM |
when set to True, runs HMM to predict CNV level (default: FALSE) |
HMM_transition_prob |
transition probability in HMM (default: 1e-6) |
HMM_report_by |
cell, consensus, subcluster (default: subcluster) Note, reporting is performed entirely separately from the HMM prediction. So, you can predict on subclusters, but get per-cell level reporting (more voluminous output). |
HMM_type |
HMM model type. Options: (i6 or i3): i6: infercnv 6-state model (0, 0.5, 1, 1.5, 2, >2) where state emissions are calibrated based on simulated CNV levels. i3: infercnv 3-state model (del, neutral, amp) configured based on normal cells and HMM_i3_pval |
HMM_i3_pval |
p-value for HMM i3 state overlap (default: 0.05) |
HMM_i3_use_KS |
boolean: use the KS test statistic to estimate mean of amp/del distributions (ala HoneyBadger). (default=TRUE) ## Filtering low-conf HMM preds via BayesNet P(Normal) |
BayesMaxPNormal |
maximum P(Normal) allowed for a CNV prediction according to BayesNet. (default=0.5, note zero turns it off) |
sim_method |
method for calibrating CNV levels in the i6 HMM (default: 'meanvar') |
sim_foreground |
don't use... for debugging, developer option. |
reassignCNVs |
(boolean) Given the CNV associated probability of belonging to each possible state, reassign the state assignments made by the HMM to the state that has the highest probability. (default: TRUE) ###################### ## Tumor subclustering |
analysis_mode |
options(samples|subclusters|cells), Grouping level for image filtering or HMM predictions. default: samples (fastest, but subclusters is ideal) |
tumor_subcluster_partition_method |
method for defining tumor subclusters. Options('leiden', 'random_trees', 'qnorm') leiden: Runs a nearest neighbor search, where communities are then partitionned with the Leiden algorithm. random_trees: Slow, uses permutation statistics w/ tree construction. qnorm: defines tree height based on the quantile defined by the tumor_subcluster_pval |
tumor_subcluster_pval |
max p-value for defining a significant tumor subcluster (default: 0.1) |
k_nn |
number k of nearest neighbors to search for when using the Leiden partition method for subclustering (default: 20) |
leiden_method |
Method used to generate the graph on which the Leiden algorithm is applied, one of "PCA" or "simple". (default: "PCA") |
leiden_function |
Whether to use the Constant Potts Model (CPM) or modularity in igraph. Must be either "CPM" or "modularity". (default: "CPM") |
leiden_resolution |
resolution parameter for the Leiden algorithm using the CPM quality score (default: auto) |
leiden_method_per_chr |
Method used to generate the graph on which the Leiden algorithm is applied for the per chromosome subclustering, one of "PCA" or "simple". (default: "simple") |
leiden_function_per_chr |
Whether to use the Constant Potts Model (CPM) or modularity in igraph for the per chromosome subclustering. Must be either "CPM" or "modularity". (default: "modularity") |
leiden_resolution_per_chr |
resolution parameter for the Leiden algorithm for the per chromosome subclustering (default: 1) |
per_chr_hmm_subclusters |
Run subclustering per chromosome over all cells combined to run the HMM on those subclusters instead. Only applicable when using Leiden subclustering. This should provide enough definition in the predictions while avoiding subclusters that are too small thus providing less evidence to work with. (default: FALSE) |
per_chr_hmm_subclusters_references |
Whether the per chromosome subclustering should also be done on references, which should not have as much variation as observations. (default = FALSE) |
z_score_filter |
Z-score used as a treshold to filter genes used for subclustering. Applied based on reference genes to automatically ignore genes with high expression variability such as MHC genes. (default: 0.8) ############################# ## de-noising parameters #### |
denoise |
If True, turns on denoising according to options below |
noise_filter |
Values +- from the reference cell mean will be set to zero (whitening effect) default(NA, instead will use sd_amplifier below. |
sd_amplifier |
Noise is defined as mean(reference_cells) +- sdev(reference_cells) * sd_amplifier default: 1.5 |
noise_logistic |
use the noise_filter or sd_amplifier based threshold (whichever is invoked) as the midpoint in a logistic model for downscaling values close to the mean. (default: FALSE) ################## ## Outlier pruning |
outlier_method_bound |
Method to use for bounding outlier values. (default: "average_bound") Will preferentially use outlier_lower_bounda and outlier_upper_bound if set. |
outlier_lower_bound |
Outliers below this lower bound will be set to this value. |
outlier_upper_bound |
Outliers above this upper bound will be set to this value. ########################## ## Misc options |
final_scale_limits |
The scale limits for the final heatmap output by the run() method. Default "auto". Alt, c(low,high) |
final_center_val |
Center value for final heatmap output by the run() method. |
debug |
If true, output debug level logging. |
num_threads |
(int) number of threads for parallel steps (default: 4) |
plot_steps |
If true, saves infercnv objects and plots data at the intermediate steps. |
inspect_subclusters |
If true, plot subclusters as annotations after the subclustering step to easily see if the subclustering options are good. (default = TRUE) |
resume_mode |
leverage pre-computed and stored infercnv objects where possible. (default=TRUE) |
png_res |
Resolution for png output. |
plot_probabilities |
option to plot posterior probabilities (default: TRUE) |
save_rds |
Whether to save the current step object results as an .rds file (default: TRUE) |
save_final_rds |
Whether to save the final object results as an .rds file (default: TRUE) |
diagnostics |
option to create diagnostic plots after running the Bayesian model (default: FALSE) ####################### ## Experimental options |
remove_genes_at_chr_ends |
experimental option: If true, removes the window_length/2 genes at both ends of the chromosome. |
prune_outliers |
Define outliers loosely as those that exceed the mean boundaries among all cells. These are set to the bounds. ## experimental opts involving DE analysis |
mask_nonDE_genes |
If true, sets genes not significantly differentially expressed between tumor/normal to the mean value for the complete data set (default: 0.05) |
mask_nonDE_pval |
p-value threshold for defining statistically significant DE genes between tumor/normal |
test.use |
statistical test to use. (default: "wilcoxon") alternatives include 'perm' or 't'.' |
require_DE_all_normals |
If mask_nonDE_genes is set, those genes will be masked only if they are are found as DE according to test.use and mask_nonDE_pval in each of the comparisons to normal cells options: "any", "most", "all" (default: "any") other experimental opts |
hspike_aggregate_normals |
instead of trying to model the different normal groupings individually, just merge them in the hspike. |
no_plot |
don't make any of the images. Instead, generate all non-image outputs as part of the run. (default: FALSE) |
no_prelim_plot |
don't make the preliminary infercnv image (default: FALSE) |
write_expr_matrix |
Whether to write text files with the content of matrices when generating plots (default: FALSE) |
write_phylo |
Whether to write newick strings of the dendrograms displayed on the left side of the heatmap to file (default: FALSE) |
output_format |
Output format for the figure. Choose between "png", "pdf" and NA. NA means to only write the text outputs without generating the figure itself. (default: "png") |
plot_chr_scale |
Whether to scale the chromosme width on the heatmap based on their actual size rather than just the number of expressed genes. |
chr_lengths |
A named list of chromsomes lengths to use when plot_chr_scale=TRUE, or else chromosome size is assumed to be the last chromosome's stop position + 10k bp |
useRaster |
Whether to use rasterization for drawing heatmap. Only disable if it produces an error as it is much faster than not using it. (default: TRUE) |
up_to_step |
run() only up to this exact step number (default: 100 >> 23 steps currently in the process) |
infercnv_obj containing filtered and transformed data
data(infercnv_data_example)
data(infercnv_annots_example)
data(infercnv_genes_example)
infercnv_object_example <- infercnv::CreateInfercnvObject(raw_counts_matrix=infercnv_data_example,
gene_order_file=infercnv_genes_example,
annotations_file=infercnv_annots_example,
ref_group_names=c("normal"))
infercnv_object_example <- infercnv::run(infercnv_object_example,
cutoff=1,
out_dir=tempfile(),
cluster_by_groups=TRUE,
denoise=TRUE,
HMM=FALSE,
num_threads=2,
analysis_mode="samples",
no_plot=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.