construct_features_parallel: construct_features_parallel

View source: R/construct_features_parallel.R

construct_features_parallelR Documentation

construct_features_parallel

Description

This function lists all restriction enzyme cutsites of a given genome and genome version with genomic features outlined in Carty et al. (2017) https://www.nature.com/articles/ncomms15454; GC content, mappability, and effective length

Usage

construct_features_parallel(
  output_path,
  gen = "Hsapiens",
  gen_ver = "hg19",
  sig = "GATC",
  bin_type = "Bins-uniform",
  binsize = 5000,
  wg_file = NULL,
  chrs = NULL,
  feature_type = "RE-based",
  ncore = NULL
)

Arguments

output_path

the path to the folder and name prefix you want to place feature files into. The feature file will have the suffix '_bintolen.txt.gz'.

gen

name of the species: e.g., default 'Hsapiens'.

gen_ver

genomic assembly version: e.g., default 'hg19'.

sig

restriction enzyme cut pattern (or a vector of patterns; e.g., 'GATC' or c('GATC','GANTC')).

bin_type

'Bins-uniform' if uniformly binned by binsize in bp, or 'Bins-RE-sites' if binned by number of restriction enzyme fragments.

binsize

binsize in bp if bin_type='Bins-uniform' (or number of RE fragment cut sites if bin_type='Bins-RE-sites'), defaults to 5000.

wg_file

path to the bigWig file containing mappability values across the genome of interest.

chrs

select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes (except Y and M) in the genome specified.

feature_type

'RE-based' if features are to be computed based on restriction enzyme fragments. 'RE-agnostic' ignores restriction enzyme cutsite information and computes features gc and map based on binwide averages. bin_type has to be 'Bins-uniform' if feature_type='RE-agnostic'.

ncore

Number of cores to parallelize. Defaults to parallel::detectCores()-1.

Value

a features 'bintolen' file that contains GC, mappability and length features.

Examples

outdir<-paste0(tempdir(check=TRUE),'/')
construct_features_parallel(output_path=outdir,gen='Hsapiens',
gen_ver='hg19',sig=c('GATC','GANTC'),bin_type='Bins-uniform',binsize=100000,
wg_file=NULL,chrs=c('chr21'),ncore=2)

mervesa/HiCDCPlus documentation built on June 8, 2022, 3:43 a.m.