FIREcaller: FIREcaller: an R package for detecting frequently interacting...

View source: R/FIREcaller.R

FIREcallerR Documentation

FIREcaller: an R package for detecting frequently interacting regions from Hi-C data

Description

This function FIREcaller() in the FIREcaller package ( user-friendly R package for detecting FIREs from Hi-C data), For default parameters: FIREcaller takes raw Hi-C NxN contact matrix as input, performs within-sample and cross-sample normalization via HiCNormCis and quantile normalization respectively, and outputs FIRE scores, FIREs and super-FIREs. Input is either an NxN contact matrices in the form of a .gz file, or data in .cool and .hic format.

Usage

FIREcaller(file.list=(...), gb=c("hg19","GRCh38","mm9","mm10",""), map_file="", nchrom=0, chroms_file="", juicer_tools_version=NULL, binsize=c(10000,20000,40000),upper_cis=200000, normalized=c("TRUE","FALSE"),filter=c("TRUE","FALSE"),rm_mhc=c("TRUE","FALSE"),rm_EBL=c("TRUE","FALSE"), rm_perc=0.25, dist=c('poisson','nb'),alpha=0.05,diff_fires=c('TRUE','FALSE'))

Arguments

file.list

a list of files used for FIREcaller. If in .gz format, the naming convention is $prefix.chr$chr#.gz. If in .cool format, the naming convention is $prefix.cool.

gb

a string that defines the genome build type. If missing, an error message is returned.

map_file

a string that defines the name of the mappability file specific to the samples genome build, restriction enzyme, and resolution. Only contains chromosomes you want to input. See read me for format.

nchrom

a numeric value for the number of chromosomes. If gb is in c("hg19","GRCh38","mm9","mm10"), nchrom can be omitted, otherwise it must be provided.

chroms_file

a string that defines the name of the file including the size of each chromosome of the genome build type. If gb is in c("hg19","GRCh38","mm9","mm10"), chroms_file can be omitted, otherwise it must be provided. See read me for format.

juicer_tools_version

a string that defines the version of juicer_tools. If the input file is in .hic format, it must be provided. It should be the full name of juicer_tools file name. For example: "juicer_tools.2.20.00.ac.jar".

binsize

a numeric value for the binsize. Default is 40000 (40Kb) with other options being 10Kb or 20Kb.

upper_cis

a bound for the cis-interactions calculation. The default is 200000 (200Kb).

normalized

a logical value for whether the input matrices are ALREADY normalized. If TRUE, the normalization procedures are skipped. Default=FALSE.

rm_mhc

a logical value indicating whether to remove the MHC region of the sample. Default is "TRUE" if gb is in c("hg19","GRCh38","mm9","mm10"), "FALSE" if not.

rm_EBL

a logical value indicating whether to remove the ENCODE blacklist regions of the sample. Default is "TRUE" if gb is in c("hg19","GRCh38","mm9","mm10"), "FALSE" if not.

rm_perc

is the percentage of "bad-bins" in a cis-interaction calculation to filter. Default is 0.25 (25% filtered)

dist

is the distribution specification for the HiCNormCis normalization and FIREscore calculation. The default is Poisson.

alpha

is the type 1 error for the p-value cut off. Default is 0.05.

diff_fires

a logical value for whether to include the differential FIRE analysis. Samples need to have _rep1 or _rep2 differences in the file name.Default=FALSE.

Details

The process includes calculating the raw fire scores, filtering (with the option of removing the MHC region and ENCODE black list), HiCNormCis, Quantile Normalization (if number of samples > 1), highlighting significant Fire Scores, and calculating the SuperFires.

Value

Two sets of files will be returned at default. The total number of files outputted are 1+ (number of prefixes/samples):

  • Fire: a single text file is outputted for all the samples and all chromosomes.This file contains the Fire Score, associated ln(pvalue), and an indicator if the region is a FIRE or not with I(pvalues > -ln(0.05)).

  • SuperFire: a text file for each sample with a list of Super Fires and corresponding -log10(pvalue).

Note

Mappability files are available https://yunliweb.its.unc.edu/FIREcaller/

Author(s)

Crowley, Cheynna Anne <cacrowle@live.unc.edu>, Yuchen Yang <yyuchen@email.unc.edu>, Ming Hu <afhuming@gmail.com>, Yun Li <yunli@med.unc.edu>

References

Cheynna Crowley, Yuchen Yang, Ming Hu, Yun Li. FIREcaller: an R package for detecting frequently interacting regions from Hi-C data

See Also

Paper: https://doi.org/10.1016/j.celrep.2016.10.061 ; https://doi.org/10.1101/619288

Examples

# set working directory: the location of the NxN matrices and mappability files
setwd('~/Documents/Schmitt_Hippo_40KB_input/')

# define the filename following if in .gz format, the naming convention is ${prefix}.chr${chr#}.gz. If in .cool format, the naming convention is ${prefix}.cool
file.list <- c(paste0('Hippo_chr',1:22,'.gz'))

# define the genome build
gb<-'hg19'

# define the name of the mappability file
map_file<-'Hind3_hg19_40Kb_encodeBL_F_GC_M_auto.txt.gz'
    
# define the nchrom (can be omitted in this example since the package has included the chromosome size data for genome build 'hg19')
nchrom<-23
    
# define the name of the chromosome size file (can be omitted in this example since the package has included the chromosome size data for genome build 'hg19')
# chroms_file<-NULL
  
# define the version of juicer_tools if the input file is in .hic format, otherwise it is not required. It should be the full name of the juicer_tools.
# juicer_tools_version<-NULL

# define the binsize. Default=40000 (40Kb). Other recommended bin sizes are 10000 (10Kb) and 20000 (20Kb).
binsize<-40000

# define the upper bound of the cis-interactions; default=200000 (200Kb); if not a multiple of the bin, then takes the ceiling;
upper_cis<-200000

# define if the input matrix is ALREADY normalized. Default=FALSE. If true, it will skip within-normalization step.
normalized<-FALSE

# define whether to remove MHC region; Default=TRUE
rm_mhc <- TRUE

# define whether to remove ENCODE blacklist region; Default=TRUE
rm_EBL<- TRUE

# define the percentage to problematic bins allowed in the cis-interaction calculation (0-1); Default is 25%.
rm_perc<-0.25

# define whether the distribution should be poisson or negative binomial; Default=Poisson.
dist<-'poisson'

# define the alpha cut off for a significant p-value. Default=0.05.
alpha<-0.05

# define if a circos plot should be created of FIREs and super-FIREs; Default=FALSE
plots<-FALSE

# specify if differential fires should be calculated between 2 samples and atleast 2 replicates per sample; Defaul=FALSE
diff_fires<-FALSE

# run the function
FIREcaller(file.list, gb, map_file, nchrom, chroms_file=NULL, juicer_tools_version=NULL, binsize=40000, upper_cis=200000,normalized=FALSE, rm_mhc = TRUE,rm_EBL=TRUE, rm_perc=0.25, dist='poisson',alpha=0.05, plots=FALSE,diff_fires=FALSE)


yycunc/FIREcaller documentation built on Nov. 13, 2022, 7:49 p.m.