HiCDCPlus_parallel: HiCDCPlus_parallel

View source: R/HiCDCPlus_parallel.R

HiCDCPlus_parallelR Documentation

HiCDCPlus_parallel

Description

This function finds significant interactions in a HiC-DC readable matrix and expresses statistical significance of counts through the following with a parallel implementation (using sockets; compatible with Windows): 'pvalue': significance P-value, 'qvalue': FDR corrected P-value, mu': expected counts, 'sdev': modeled standard deviation of expected counts.

Usage

HiCDCPlus_parallel(
  gi_list,
  covariates = NULL,
  chrs = NULL,
  distance_type = "spline",
  model_distribution = "nb",
  binned = TRUE,
  df = 6,
  Dmin = 0,
  Dmax = 2e+06,
  ssize = 0.01,
  splineknotting = "uniform",
  ncore = NULL
)

Arguments

gi_list

List of GenomicInteractions objects where each object named with chromosomes contains intrachromosomal interaction information (minimally containing counts and genomic distance in mcols(gi_list[[1]])—see ?gi_list_validate for a detailed explanation of valid gi_list instances).

covariates

covariates to be considered in addition to genomic distance D. Defaults to all covariates besides 'D','counts','mu','sdev',pvalue','qvalue' in mcols(gi)

chrs

select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes in the gi_list.

distance_type

distance covariate form: 'spline' or 'log'. Defaults to 'spline'.

model_distribution

'nb' uses a Negative Binomial model, 'nb_vardisp' uses a Negative Binomial model with a distance specific dispersion parameter inferred from the data, 'nb_hurdle' uses the legacy HiC-DC model.

binned

TRUE if uniformly binned or FALSE if binned by restriction enzyme fragment cutsites

df

degrees of freedom for the genomic distance spline function if distance_type='spline'. Defaults to 6, which corresponds to a cubic spline as explained in Carty et al. (2017)

Dmin

minimum distance (included) to check for significant interactions, defaults to 0

Dmax

maximum distance (included) to check for significant interactions, defaults to 2e6 or maximum in the data; whichever is minimum.

ssize

Distance stratified sampling size. Can decrease for large chromosomes. Increase recommended if model fails to converge. Defaults to 0.01.

splineknotting

Spline knotting strategy. Either "uniform", uniformly spaced in distance, or placed based on distance distribution of counts "count-based" (i.e., more closely spaced where counts are more dense).

ncore

Number of cores to parallelize. Defaults to parallel::detectCores()-1.

Value

A valid gi_list instance with additional mcols(.) for each chromosome: pvalue': significance P-value, 'qvalue': FDR corrected P-value, mu': expected counts, 'sdev': modeled standard deviation of expected counts.

Examples

gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path=system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
gi<-HiCDCPlus_parallel(gi_list,ncore=1)

mervesa/HiCDCPlus documentation built on June 8, 2022, 3:43 a.m.