run: run() : Invokes a routine inferCNV analysis to Infer CNV...

View source: R/inferCNV_ops.R

runR Documentation

run() : Invokes a routine inferCNV analysis to Infer CNV changes given a matrix of RNASeq counts.

Description

Function doing the actual analysis before calling the plotting functions.

Usage

run(
  infercnv_obj,
  cutoff = 1,
  min_cells_per_gene = 3,
  out_dir = NULL,
  window_length = 101,
  smooth_method = c("pyramidinal", "runmeans", "coordinates"),
  num_ref_groups = NULL,
  ref_subtract_use_mean_bounds = TRUE,
  cluster_by_groups = TRUE,
  cluster_references = TRUE,
  k_obs_groups = 1,
  hclust_method = "ward.D2",
  max_centered_threshold = 3,
  scale_data = FALSE,
  HMM = FALSE,
  HMM_transition_prob = 1e-06,
  HMM_report_by = c("subcluster", "consensus", "cell"),
  HMM_type = c("i6", "i3"),
  HMM_i3_pval = 0.05,
  HMM_i3_use_KS = FALSE,
  BayesMaxPNormal = 0.5,
  sim_method = "meanvar",
  sim_foreground = FALSE,
  reassignCNVs = TRUE,
  analysis_mode = c("subclusters", "samples", "cells"),
  tumor_subcluster_partition_method = c("leiden", "random_trees", "qnorm", "pheight",
    "qgamma", "shc"),
  tumor_subcluster_pval = 0.1,
  k_nn = 20,
  leiden_method = c("PCA", "simple"),
  leiden_function = c("CPM", "modularity"),
  leiden_resolution = "auto",
  leiden_method_per_chr = c("simple", "PCA"),
  leiden_function_per_chr = c("modularity", "CPM"),
  leiden_resolution_per_chr = 1,
  per_chr_hmm_subclusters = FALSE,
  per_chr_hmm_subclusters_references = FALSE,
  z_score_filter = 0.8,
  denoise = FALSE,
  noise_filter = NA,
  sd_amplifier = 1.5,
  noise_logistic = FALSE,
  outlier_method_bound = "average_bound",
  outlier_lower_bound = NA,
  outlier_upper_bound = NA,
  final_scale_limits = NULL,
  final_center_val = NULL,
  debug = FALSE,
  num_threads = 4,
  plot_steps = FALSE,
  inspect_subclusters = TRUE,
  resume_mode = TRUE,
  png_res = 300,
  plot_probabilities = TRUE,
  save_rds = TRUE,
  save_final_rds = TRUE,
  diagnostics = FALSE,
  remove_genes_at_chr_ends = FALSE,
  prune_outliers = FALSE,
  mask_nonDE_genes = FALSE,
  mask_nonDE_pval = 0.05,
  test.use = "wilcoxon",
  require_DE_all_normals = "any",
  hspike_aggregate_normals = FALSE,
  no_plot = FALSE,
  no_prelim_plot = FALSE,
  write_expr_matrix = FALSE,
  write_phylo = FALSE,
  output_format = "png",
  plot_chr_scale = FALSE,
  chr_lengths = NULL,
  useRaster = TRUE,
  up_to_step = 100
)

Arguments

infercnv_obj

An infercnv object populated with raw count data

cutoff

Cut-off for the min average read counts per gene among reference cells. (default: 1)

min_cells_per_gene

minimum number of reference cells requiring expression measurements to include the corresponding gene. default: 3

out_dir

path to directory to deposit outputs (default: NULL, required to provide non NULL)

## Smoothing params

window_length

Length of the window for the moving average (smoothing). Should be an odd integer. (default: 101)#'

smooth_method

Method to use for smoothing: c(runmeans,pyramidinal,coordinates) default: pyramidinal

#####

num_ref_groups

The number of reference groups or a list of indices for each group of reference indices in relation to reference_obs. (default: NULL)

ref_subtract_use_mean_bounds

Determine means separately for each ref group, then remove intensities within bounds of means (default: TRUE) Otherwise, uses mean of the means across groups.

#############################

cluster_by_groups

If observations are defined according to groups (ie. patients), each group of cells will be clustered separately. (default=FALSE, instead will use k_obs_groups setting)

cluster_references

Whether to cluster references within their annotations or not. (dendrogram not displayed) (default: TRUE)

k_obs_groups

Number of groups in which to break the observations. (default: 1)

hclust_method

Method used for hierarchical clustering of cells. Valid choices are: "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid". default("ward.D2")

max_centered_threshold

The maximum value a value can have after centering. Also sets a lower bound of -1 * this value. (default: 3), can set to a numeric value or "auto" to bound by the mean bounds across cells. Set to NA to turn off.

scale_data

perform Z-scaling of logtransformed data (default: FALSE). This may be turned on if you have very different kinds of data for your normal and tumor samples. For example, you need to use GTEx representative normal expression profiles rather than being able to leverage normal single cell data that goes with your experiment.

######################################################################### ## Downstream Analyses (HMM or non-DE-masking) based on tumor subclusters

HMM

when set to True, runs HMM to predict CNV level (default: FALSE)

HMM_transition_prob

transition probability in HMM (default: 1e-6)

HMM_report_by

cell, consensus, subcluster (default: subcluster) Note, reporting is performed entirely separately from the HMM prediction. So, you can predict on subclusters, but get per-cell level reporting (more voluminous output).

HMM_type

HMM model type. Options: (i6 or i3): i6: infercnv 6-state model (0, 0.5, 1, 1.5, 2, >2) where state emissions are calibrated based on simulated CNV levels. i3: infercnv 3-state model (del, neutral, amp) configured based on normal cells and HMM_i3_pval

HMM_i3_pval

p-value for HMM i3 state overlap (default: 0.05)

HMM_i3_use_KS

boolean: use the KS test statistic to estimate mean of amp/del distributions (ala HoneyBadger). (default=TRUE)

## Filtering low-conf HMM preds via BayesNet P(Normal)

BayesMaxPNormal

maximum P(Normal) allowed for a CNV prediction according to BayesNet. (default=0.5, note zero turns it off)

sim_method

method for calibrating CNV levels in the i6 HMM (default: 'meanvar')

sim_foreground

don't use... for debugging, developer option.

reassignCNVs

(boolean) Given the CNV associated probability of belonging to each possible state, reassign the state assignments made by the HMM to the state that has the highest probability. (default: TRUE)

###################### ## Tumor subclustering

analysis_mode

options(samples|subclusters|cells), Grouping level for image filtering or HMM predictions. default: samples (fastest, but subclusters is ideal)

tumor_subcluster_partition_method

method for defining tumor subclusters. Options('leiden', 'random_trees', 'qnorm') leiden: Runs a nearest neighbor search, where communities are then partitionned with the Leiden algorithm. random_trees: Slow, uses permutation statistics w/ tree construction. qnorm: defines tree height based on the quantile defined by the tumor_subcluster_pval

tumor_subcluster_pval

max p-value for defining a significant tumor subcluster (default: 0.1)

k_nn

number k of nearest neighbors to search for when using the Leiden partition method for subclustering (default: 20)

leiden_method

Method used to generate the graph on which the Leiden algorithm is applied, one of "PCA" or "simple". (default: "PCA")

leiden_function

Whether to use the Constant Potts Model (CPM) or modularity in igraph. Must be either "CPM" or "modularity". (default: "CPM")

leiden_resolution

resolution parameter for the Leiden algorithm using the CPM quality score (default: auto)

leiden_method_per_chr

Method used to generate the graph on which the Leiden algorithm is applied for the per chromosome subclustering, one of "PCA" or "simple". (default: "simple")

leiden_function_per_chr

Whether to use the Constant Potts Model (CPM) or modularity in igraph for the per chromosome subclustering. Must be either "CPM" or "modularity". (default: "modularity")

leiden_resolution_per_chr

resolution parameter for the Leiden algorithm for the per chromosome subclustering (default: 1)

per_chr_hmm_subclusters

Run subclustering per chromosome over all cells combined to run the HMM on those subclusters instead. Only applicable when using Leiden subclustering. This should provide enough definition in the predictions while avoiding subclusters that are too small thus providing less evidence to work with. (default: FALSE)

per_chr_hmm_subclusters_references

Whether the per chromosome subclustering should also be done on references, which should not have as much variation as observations. (default = FALSE)

z_score_filter

Z-score used as a treshold to filter genes used for subclustering. Applied based on reference genes to automatically ignore genes with high expression variability such as MHC genes. (default: 0.8)

############################# ## de-noising parameters ####

denoise

If True, turns on denoising according to options below

noise_filter

Values +- from the reference cell mean will be set to zero (whitening effect) default(NA, instead will use sd_amplifier below.

sd_amplifier

Noise is defined as mean(reference_cells) +- sdev(reference_cells) * sd_amplifier default: 1.5

noise_logistic

use the noise_filter or sd_amplifier based threshold (whichever is invoked) as the midpoint in a logistic model for downscaling values close to the mean. (default: FALSE)

################## ## Outlier pruning

outlier_method_bound

Method to use for bounding outlier values. (default: "average_bound") Will preferentially use outlier_lower_bounda and outlier_upper_bound if set.

outlier_lower_bound

Outliers below this lower bound will be set to this value.

outlier_upper_bound

Outliers above this upper bound will be set to this value.

########################## ## Misc options

final_scale_limits

The scale limits for the final heatmap output by the run() method. Default "auto". Alt, c(low,high)

final_center_val

Center value for final heatmap output by the run() method.

debug

If true, output debug level logging.

num_threads

(int) number of threads for parallel steps (default: 4)

plot_steps

If true, saves infercnv objects and plots data at the intermediate steps.

inspect_subclusters

If true, plot subclusters as annotations after the subclustering step to easily see if the subclustering options are good. (default = TRUE)

resume_mode

leverage pre-computed and stored infercnv objects where possible. (default=TRUE)

png_res

Resolution for png output.

plot_probabilities

option to plot posterior probabilities (default: TRUE)

save_rds

Whether to save the current step object results as an .rds file (default: TRUE)

save_final_rds

Whether to save the final object results as an .rds file (default: TRUE)

diagnostics

option to create diagnostic plots after running the Bayesian model (default: FALSE)

####################### ## Experimental options

remove_genes_at_chr_ends

experimental option: If true, removes the window_length/2 genes at both ends of the chromosome.

prune_outliers

Define outliers loosely as those that exceed the mean boundaries among all cells. These are set to the bounds.

## experimental opts involving DE analysis

mask_nonDE_genes

If true, sets genes not significantly differentially expressed between tumor/normal to the mean value for the complete data set (default: 0.05)

mask_nonDE_pval

p-value threshold for defining statistically significant DE genes between tumor/normal

test.use

statistical test to use. (default: "wilcoxon") alternatives include 'perm' or 't'.'

require_DE_all_normals

If mask_nonDE_genes is set, those genes will be masked only if they are are found as DE according to test.use and mask_nonDE_pval in each of the comparisons to normal cells options: "any", "most", "all" (default: "any")

other experimental opts

hspike_aggregate_normals

instead of trying to model the different normal groupings individually, just merge them in the hspike.

no_plot

don't make any of the images. Instead, generate all non-image outputs as part of the run. (default: FALSE)

no_prelim_plot

don't make the preliminary infercnv image (default: FALSE)

write_expr_matrix

Whether to write text files with the content of matrices when generating plots (default: FALSE)

write_phylo

Whether to write newick strings of the dendrograms displayed on the left side of the heatmap to file (default: FALSE)

output_format

Output format for the figure. Choose between "png", "pdf" and NA. NA means to only write the text outputs without generating the figure itself. (default: "png")

plot_chr_scale

Whether to scale the chromosme width on the heatmap based on their actual size rather than just the number of expressed genes.

chr_lengths

A named list of chromsomes lengths to use when plot_chr_scale=TRUE, or else chromosome size is assumed to be the last chromosome's stop position + 10k bp

useRaster

Whether to use rasterization for drawing heatmap. Only disable if it produces an error as it is much faster than not using it. (default: TRUE)

up_to_step

run() only up to this exact step number (default: 100 >> 23 steps currently in the process)

Value

infercnv_obj containing filtered and transformed data

Examples

data(infercnv_data_example)
data(infercnv_annots_example)
data(infercnv_genes_example)

infercnv_object_example <- infercnv::CreateInfercnvObject(raw_counts_matrix=infercnv_data_example, 
                                                          gene_order_file=infercnv_genes_example,
                                                          annotations_file=infercnv_annots_example,
                                                          ref_group_names=c("normal"))

infercnv_object_example <- infercnv::run(infercnv_object_example,
                                         cutoff=1,
                                         out_dir=tempfile(), 
                                         cluster_by_groups=TRUE, 
                                         denoise=TRUE,
                                         HMM=FALSE,
                                         num_threads=2,
                                         analysis_mode="samples",
                                         no_plot=TRUE)


broadinstitute/infercnv documentation built on Nov. 19, 2024, 1:30 a.m.